BD 111 OSO 



TTTL2 



INSTITOTIOM 
PUB DATE 
BdTE 

EDBS., P«xCE 
DESCRlPrORS 



Ti 009 952 



IDENTIFIERS 



ABSTRACT 

topics: 
(^) uses 
r«sea;:ch 
publisho 
tests, p 
A. Haqqa 
for the 
Horst, ,P 
Apti.tude 
Paul Lb 
ctlacussl 



ProcMdinQs. of *-ht 
Probleaa C2nd« Mew 

Educ.atioTial '^•stina 

i<) Oct U9 

86p. 



iTifltational Confezenc* 
Tork» New irork, October 



on Testlnig 
29, 



Service, Princeton, M.JJ 



HPOVPCOU Plus Postage, 

Achleveieiit Testa; Aptitude Tests; Conference ' 
Peports; *Cult«ir«l ractors; ♦Factor Ani'lyale: 
Inforiation Dlsaeainatlon: ♦Inforaatipn Needs { 
Intellioence •r'ests: Personality Teats:. Piychological 
studies: Pji?choloqi<;al Testing: Research Cesign; 
♦Teating Probleta: ♦te^t Fesolts: Teat Validity 
♦Testing Industry : ♦Test-Be porting 



The conference panels were organited around three * 
(1) influences of cultural background on teat perforaance; 
arid liaitatior.s of factor, analysis in psychological 
; and <3) inforaatlon which should be provided by test 
rs and testing aaencies on the? validity and use of their 
anelists for ♦he first session iticJluded ' Anne Anestasi, Irneat 
Id, williaa, Stephenson, anfi willlai ». Turnbull. Paneliata 
second session eere George K, Bennett, H-J. Eysenck, and Paul 
anelists and their topics for the third sesaion were: 

and IntelUgence Testa, Herbert Conrad: Achieveaent Testa, 
Dressel: a4\d personali+y Teats, Laurance P. Shaffer. Brlsf 
ons followed each of the three p>xnels. (KH) * ' 



♦ Reproductions supplied by BDPS are the best that can be aade ♦ 

♦ froa the oriainal 'docuaent. * ♦ 



1 > 



NAT IMAi IMtfiTUfI 09 
• MIIATIOM 

TNH OOCUMiNT MAI ttiCfti mmum^ 



ERIC 




in 





"PROCEEDING,: 




•1949 

« 

INVITATIONAL 
CONFERENCE 
^ ON 
TESTING 
PROBLEMS 



EDUCATIONAL TESTING 
SERVICE 

' PRINCETON, NF.W JEMEY 

LOS ANOELBt, CALIFORNI/l 



•»PERM(SSIOH TO REfRODUCE THIS 
MATEniAi- MAP HFCN URANTED BY 




I 

i 



TO TMK COV/frATIONAL RtSOinCtS 
INFORMATION CtNTEt^ ^CRIC) AND 
USER9 OF THE mm SYSTEM." 



4ri 



3 



KDUCATIONAL^TESTING SERVJCF. 



BOARD or TRUITKKI 



Herald C. Hint, CMrmm 



Raymond B. AlUn^ Hcniy H. Hill 

A'' 



I Joieph W, Birker Katharine E, McBride, 0x ojficio 

. Oliver C. Carmlchaeli 0x officio Thomai R, McConnell 

Charles W. ^^t Letter W. Nelion 

Jafnea B. Conant Edward S, Noyci 

0 M 

George F. ISSook, $x officio 



a'r y t c s m 

Henry Chaunceyi Prisidcmi 
Richard H. SoUiyani Vkc Fr$$id$m mU Tr$4ft$r$r 
.Villiam TarnbttU, Vice ?f$$U$ni . 
, Jack RimaloveTi Sccrciitry 
Catherine G. Sharp^ AitUlmi Sccrciary 
Robert F. Kolkebecki AssiiiMi Trcmrcr 



COPYRIOtlT) I95O1 aIDUCATIOKAL TEST1N6 SERVICE, 
aO NAStAU mKET) PHrNCETON, N.J, 
PRfNlXD IN THE UNITED ITATES or AMERICA 




ERIC 



U 



• c 



\ 



INVITATIONAL^ 
CONFERENCE 



ON 



TESTING PROBLEMS 



October 29, 1949 

• f. 

OSCAR K. BUROS, CHAIRMAN ^ 

t[ Influence of cul,tural background on test per- 
formance. " j 

t[ Uses and limitations of factor analysis in psy- 
chological research. 

t[ Information which should be provided by test 
publishers and testing agencies on the validity 
and use of their tests. 



EDUCATIONAL TESTING SERVIC^ 

Pki JCFTON, NEW JERSEY ♦ LOS ANGELES, CALIFORNIA 



ERIC 



(J 



FOREWORD 

li^E 1949 Invitational Conference on Testing 
Problems was^nthvraastically received by those 
who were present. Since, the excellent papers 
that were presented deserve a wider audience, 
they are being publishedln full| and along with 
theiQ, the discussion from the floor that followed 
the formal presentations* 

To Oscar K» Buros, who selected the topics, 
invited tM participants, and conducted the meet* 
ing, goes, full credit. for the success of the con-- 
ference. I would like to take this opportunity to 
express our grateful appredaticm to him and to 
the speakers. ■ ^ 

HaNar Chauncby, Presidem 
Educational ^Testing Service 



PREFACE 



The -'1949 Invitational Conference 
on Tesiing Problems, "iponsored by 
the £ducational Toting Service^ wai 
held at the Kooievelt Hotel in Ne# 
York City on October 29, 1949. Thii 
conference was 4ttende4 by more 
than two hundred educators, psychoU 
ogms, and personnel workers inter* 
ested in^measureiinent and evaluation 
techniques. 

In preparing the progrim, an at« 
tempt was made to select topics some« 
what controversial in nature* Such^ 
topics appeared especially appropriate,^ 
since it hu always been customary at 
the Invitational Conferences to allot 
considerable tiqie for questions and^ 
critictsms from the audience. The top- 
ics were selected only after consulu- 
doh with persons representing other 
viewpoints in testing; the final re^n- 
sibility, however, for the' selection -of 
topics and speakers was my own. 

The following three topics were se« 
lected for the conference program: 

(1) Influence of Cultural Back- 
ground on Test Performance 

(2) Uses and Limitations of Fac- 
tor Analysis ift Psychological 
Research 

Information Which Should 
Be Provided by Test Publish- 
ers and Testing Agencies on 
the Validity and Use of Their 
Tests 

It was felt that these topics fepre- 
sented a sufficiently wide ri»:ige to 
permit all cpnference participants to 
find at least a part of the program of 
interest and value. 

Speakers were selected so as to rep-* 
resent a variety of viewpoints* We 



were especially fortunate that two dis- 
tinguished British psychologttts, H« J. 
Eysenck and WilUam Stephenson, 
Wferc in this country at the time of 
the conference and agreed to present 
papers. An eff9rt was made to select 
speakers who had not been on the con- 
ference programs in* recent years. In 
; retrospect, I think that I should have 
invited a representative of a test pub- 
likhtr to present a paper on ''Informa- 
tion Which Should Be Pi^vided by 
Xyt Publtshcra and Testing Agencies 
on the Validity a^ Use of Thkir 
Tests.'' This omlssK>n on my /part is 
especially inUrtstingf since the confer- 
ence waf Sponsored by the nation's 
largest test-construction and test-pub* 
Itshing organization. It speab well for 
the Educational Testing Service that 
it made no attempt to infiuence me 
one w^y or the other in the selection 
of the t^ic for Panel III and in the ' 
selection of speakers. ' 

I shall not attempt to summarize 
or assess ttie indtvidual papers. In my 
opinion, all of tht papers were of ex- 
ceptk>nally high quality. This publica- 
tion of the papers will permit othirs to 
evaluate the material for themselves. 

I wish to express my gratitude to 
the speakers, to the discussants, to the 
numerous persons attending the con- 
ference, and to the Educational Test- 
ing Service for its sponsorship and 
efficient handling of the conference. 
I hope that the Educational Testing 
Service will continue to give us many 
more ''Invitational Confere/.ces on 
Testing Problems." 

Oscar K. B 

1949 Conference 




S 



S . 



CONTENTS 



PACI 



FOREWORD by Mr. Ch«uhcey . 

PREFACE by Mr. Burot j 

' PANEL I "lH/lu0nc* of cultural hackground on tett fitrformmce.'* 

Anne Anasusi, Fordham University 13 

Erneft A. Haggard, Umvtrtity of Clncago jg 

Waiiam Stephenaon, Vm^g Proftssor of Ptyehblogy, Unhtr- 

tity of Chicago i...^. 23 

William W. TurnbuU, Educational Testing Sarvica 29 

DiicuaioN 



35 



PANEL II "Usts and limitations of factor analysis in fsyckological 
rasaareh** 



George K. Bennett, Psychological Corporation 41 

H. J. Eyiinck, Ivstituta of Psychiatry, University of London 45 

Paul Horst, Edueadond Testing Servica ^ 50 

DiscuauoN 



57 



PANEL III "Information which should he frovided by Ust fubUshart 
and tasting agencies on the validity and, use of thair • 
lasts,** 

Aptitudb and Intblligbhce Teiti 

Herbert Conrad, U.S, Office of Education 63 

ACHIBVBMBNT TXTCt 

Paul Dreaael, Mictugan State College 69 

PzMoHALvrt Tim 

Laurance F. Shaffer, Teachers College^ Columbia UniversUy 75 

DiscumoN .... , : 

Appendix ^1 



r 



P A N E L f j,^ 

Influence of Cultural Background 
on Test Performance 



-c ^ 

ERIC 



/ 



Some implications of Cultural Factors 
for Test Construction 



ANNE ANASTASI 



Any discuaiion of the influence of cul- we may know enough about thein to 
tural background on ten performance control them through mi^ternal nutri- 
involvei at leatt two distinct qitetdons* tkuii prenatal medical care, and the 
Firsti to what extent k te^t perform* like. But rach r^^on would rcpretent 
ance determined' by cultural factors? aij mdir0ct cdtural influence on be- 
Secondly, what shall we do about it? havwr, mediated by structural defi« 
In considering the first question, it ciencies. Moreover, any such improve- 
is important to remember at the outset ment in cultural conditions could have 
that <ulture is not synonymous with only a long-range' effect and vrould 
environment. Although this distinction not help the individual in whom the 
should be obvious, some writers ap- structuraldcficiency is already present, 
parently forget it when drawing con- « Cultural factors do, however, affect 
elusions about heredity and environ- the individual's behfvior in many di- 
ment. For example, environmental rect ways. Psychologists are coming 
factors may produce structural defi- more and more to recogntM that the 
ciencies which in turn to certain individual's attitudes, emotional re- 
types or feeblemindedness. Recent re« sponses, interests, and goal»~as wtll 
I search on such conditk>ns as Mongol- as what h^is able to accomplish in prac- 
ism, microcephaly, hydrocephaly, utA tically any area — cannot be discussed 
intracranial birth lesions ha^ yielded independently of his cultural ffim^ 
a growing ^y .of evidei;^>>.i for the of reference* Nor are such cultural 
role of prenatal environmental' fatctors influences limited to the more complex 
in the development of these conditions, fbrnui of behavior. There is a mass of 
Ydi these types of mtntaj deficiency evidence, both in the field observation^ 
would certainly not be classified as of anthropologists and in the mott 
cultural in their etiology. Nor are they controlled studies of psychologists,^ to 
remediable in the individual case 1^. indicate that ''cultural differentials" 
education or by the manipulation of are also present in motor and in discri- 
other cultural factors. Of course, the minad/e or perceptual responses, 
environmental ^factors leading to the Now, everf psychological test is a 
development of these structural defi- sample of behavior. As such, psycho- 
ciencies may themselves be culturally logical tests will — and should — reflect 
influenced in the long run. Some day, any factors which influence behavipr. It 

(13) 



I 




XT 



■V 



k". -aft- 




1949 INVITATIONAL CONFERENCE 

a 

is ut)vi4)us that every psychulogical test sex differences were deliberately elim« 

ma|ed from total scores. This was ac* 
complished iii part by dfopping items 
fvhich yielded a significant sex differ- 
V ICC in per cent passing. It is interest* 
.174: note, however, that it did not 
p '^Yii feasible to discard all su^h items, 
but that a number of remaining items 



is constructed within a specific cultural 
framework. Most tests are italidated 
against practical criteria which are die* 
tated by the particular culture. School 
achievement and vocational succe$9 
are two familiar examples of such cri« 
tcria. A few tests designed to serve a 



wider variety of purposes and posrfbly which significaiilly favored one sex 
to be used in bask research are, lA ef'^ were balanced by items, favoring the 
feet, validated against other tests. Thus other sex. The opposite procedure was 



when we report that a given test cor- 
relates hijgfhly with the number factor, 
we are actually slj^ing that the test is 
a valid predictor of the behavjor which 



followed in the construction of the 
Terman-Miles Interest-Att4tude 
Analysis (a), as well as in other simi- 
lar personality tests designed to yield 



is common to a group of tests. If we an M-F Index. In these cases, it was 
had no number tests in the battery, we just those items with Urge and signifi- 



could not have found a number fac 
tor. The type of tests which are in- 
cluded in such^a battery — however 
comprehensive the bat^ry may be — 
reflects in part the cultural framework 
in which the experimenter was reared 



cjmt sex differences in frequency of 
response which were retained. 

Another type of group difference 
which has been considered in the selecr 
tion of ^test items is illuitrated by the 
.so-called cultur$^jr00 UsU, such as the 



It is obvious that no battery samples InternationalGroupMentdTest (3), 

0U posable varieties of behavior. And the Leiter Internai^nal Performance 

as long as a selection has occurred, cul- Scale (4), and R. B. Cattell'sCulture- 

tural faciors are admitted into the. pic- Free Intelligence Test (5). In these 

ture. ^ tests, a systematic attempt is made to 

In the construction of ceruin tests, include only content which is univers- « 

special consideration has been given ally familiar in all cultures. In actual 

to cultural, group differences in the priictice, of course, such tests fall con* 

selection of test items. The practices siderably short of this goal. Moreover, 

followed with regard to items show- the term ^'culture-conynon" tests, 

ing significant group differences may would probably be more accurate than 

be illustrated, first, with reference to "culture-free," since at best, perform- 



srx dijferences. Insofar as the two 
sexe.s represent sub-cultures with dis- 
tinct mores in our society, sex differ- 
ences in item performance may h? re- 
garded as cultural differentials. The 
Stanforci-Qinet ( 1 ) ts probably one of 



ance on such items is free from cul« 
tural dxfferenceiy but not from cul- 
tural m/luences. 

As a last example, let us consider 
socL^gconomic level as a bssis for the 
evaluation of test items Oaie of the ob- 



ERIC : .. . 



the clearest examples o.' a test in which jcctives ot the extensive research proj- 

[14] 



10 



TESTING PROBLEMS 



i i i i i > )t iiii ji itei [ i i i iu i ]ji ii ^^ ^ 



■ \ V 

\ 




ERIC 



fct conducted l)y Higgftrd> DaviSi and 
(lavighurst (6) is to elimi.ute tvoin 
intelbgence tests thoM itfms;vvhich dif* 
ferentiate significantly between chil- 
dren of high and low socio-ecunomic 
sueus* On the other side of the picture, 
we find the work of Harrison Gough 
h(7) in the construction of the Social 
Sutus Scale of the Minnesou Multi« 
phasx Personality Inventory. In this 
scale, only those items Were retained 
which showed signifcant differences 
in frequency of response between in- 
dividuals in tw9 contrasted soojal 
groups. • 

It is apparent that di£Ferent investi- 
gators have treated the problem of cul- 
tural differences in test scores in oppo- 
site ways. An obvious answer is that 
the procedufej depends upon the pur- 
pose of the test* But such an ansVer 
nu^ evade the real imie* Perhaps it is 
the purpose of the tests which should 
be more carefully examined. There 
see mil to be s^me practical justificalion 
for constructing a test out of items 
which show the maximum group dif- 
ferentiation. With such a test, we can 
determine more clearly the degree to 
which an individual is behaviorally 
identified with a particular group* It 
is difficult to sec, however, under what 
conditions we should want to study in- 
dividual differences in just those items 
in which socio-ccoitomic or other cul- 
tural group differences are lacking. 
What will the rej^ultin^ test be a meas- 
ure of? Criteria are themselves tor- 
related with socio-economic and other ' 
cultural conditions. The validity of a 
test for such criteria would prohahly 
he lowered b/ eliminating the "cul- 



tural differentials." If cultural factors 
• pe important determiners of bchivior, 
why eliminate their influence from 
tests designed to umple« an^^ predict 
such behavior? 

^0 be sure, a test m#y.be invalidated 
by the presence of uncontx oiled cul- 
tural factors. ^)ut t|ui would occur 
^ly when the given cultural ^factor 
affects the test without affecting* the 
criterion. It it a question of the. 
of the inflvitnce aff ang the teat ^ore. 
For example, the inclusx>n of questions 
dealing with a fairy tale which is fa- 
miliar to children in one oulttTral ^oup 
mnd no( in another wSuld probablyv 
lower the validitjr of the test for most 
criteria* On the other hand, if one 
social group dott more poorly on cer- 
uin items because of poon facility in 
the use of English, the inclusion of 
these items would probably noi reduce 
the validity of the test. In this case^ 
the same factor which flowered the 
test score woiud also handicap the in- 
dividual in his educational 'and voca^ 
tional progress, ar well as in many 
other aspects of daily living. In like 
manner, slow work habits, emotional' 
Instability, poor"* motivation,^ !ack of 
•interest in abstract matters, and many 
other conditions which may affect test 
scores are alsq likely to influence a 
relatively broad area of criterion bc- 
havio/*. 

Whether or not an item is retained 
in a test shoaM depend ultimately upon 
its correlation with a cri^rion. Tests 
cannot be constructed in a vacuum. 
They must be designed to meet specific 
n*eeds« These needs should be defined 
in advance ?nd should determine the 



[151 



\ 



1949 I^IVITATlONAL CONFSRSNCE 



chgice of criterion.' Thb would utm 
to be trU-eVident, bui it k sometimes 
forgotten in* the course of discutsiont 
about tern. Some statemtnu mkde,re« 
garding tests imply a belief that tes6 
are designed to rneasur^ a ipooky» mys* 
xerious **t^*'«g''^ which resides in the 
individual arnL which his been design 
nated by such temis su*/*IatelUgence/'' 
"Ability Lev^J," or j^Innate Poten- 
tiality«? The ^isumptiyn Ktmr to be 
that su^ clligence^^ has been 
met^ly overlaid with t concealing 
cloak of culture. All we would thur 
ne^d to do woUld be ts strip off the 
cloak and .the person's "(l^e'' abflity 
would stind ravealed. iMy only rt« 
aclfen to sucl^ jr viewpoint h to uy 
that, if wi^are going to t mction within 
the dorr tin of. science, mmt have 
operational definitions if tests. The 
only way I know of ootaining such 
operation^ dt finiticns m in terms of 
the critera against whicN the test was 
validated. T^Kj is true vyiether a so- 
called practi^ criterion lis employed 
.whether the criterion itself is de-^ 
6ned in terms of other tes^s. aa in fac-^ 
tonal validity. Any procedure, such 
as the discarding of ceitain items, 
which raises the currelai ion of the 
test with the criterion, erijibhs us to 
give a more precise operati onal defini- 
tion Bf the (est. 3ut we caimot'^iscard 
items merply 6n the basis o ' soip^ prin- 
ciple which'K'Ms been.laid dc wn gjfnori^ 
such a$ the rule thak items showing 
significant Qroup differetK e\ must be 
' eliminated. If this procec ufe sl)ouId 
lov er the validity coefRc ent of tht 
'test, it could h^ve neither practical nor 
theoretical % justification. 



It is also pertinent to inijuire what 
would happen if we were ir carry such 
a procedure to its logical conv]usi<^n. 
If we' start eUminating items which 
differentiate > subgroups of the popu- 
Ution,*where slttll wc stop? Wt could 
\ii^^Ufl justification proceed to rvle 
out items shoiiring socio-economic dif^ 
(erences^ sex ilifference^ differences 
«m<mg ethnic minori^./ groups, SAjl 
edud^tbnal difference. Any i^ms fti 
which college graduates e«tl e!eT?ien- 
tary school gradoates tould, for ex- 
ample, be discarded oo ihis bask Nor 
should we retain items i^^hich differ- 
entiate among broader grottps,'such as 
national cdltures, or be^Jrten prelit- 
erate and more advanced odtures. If 
we do all this, I should like to ask o ^lf 
two questions in conclusion^ F'irsf, 
what will be left? Secondly, in terms 
of any criterion we may wisK to pr^ 
diet, what will be the. validity of this 
minute residii*? 

ABPERBNCXS 

(i) M^Nemir, Q. TA# Rsvisicm of tA^ 
Simford'Bimi ScHs. Bocton: Hosfh- 
H ton Mifflin Comptoj, 194a. Pp. 115. 
1 (f f. a. V) 
it) TeimaOf L, M., and Milct, C. C 
Sgx arU Pnj4fkUiiy: Siudki m Mm- 
cuimUy mU Ftmimmiy. N.Y.: Mc- 
G/aw-Kill, Pp. Soo. 

(]) Dodd, 8. C tfUsr m^ omd Cromf 
ATmW, PrttK«tdn Uaiv. Ph.D. 
Dl»er*, 1924. Pp. .101. 
, (4)' Leirer, K. G. TA* L0iisr lnUf%^i§mJ 
Pfrformkm:^ Sad§. Santa Bsr^rs, 
Calif.: Santa Ttarbara State .ColUfi 
Prei^ 19^0. Pp, 94. 
(5) Gattflli R. B. A Cuhure-Frct Inttl- 
/ lifnice Te|l:.I. /. Bin^, PtycM.^ 
L t9lo» Vol. y. Pp. iii«|f79* 

Oittell^ R. Feinst^Id^ S. and 
Sarssoo, S. B. A Cultiirt-Fm Inttl- 
!!|ti)ce Ten: It. Evaloatioii of Ciit^ 



C16) 



TESTING PROBLEMS 



tttrtl loBuffm wi Twc P(»rfonn4inre. 
J. /ilk. F^ychoL^ ly^o^ Vol. Pp. 

(«) H»ff«rd, E. Dtvi«, A., »od H«« 
' vifhy^^ R. J. Sofiw t^Actdrii which 
InAtienet PtrioVmAnct of O^Hdirn on 

" t949» Vol 3. Pp. a65. 



D«vti, A., and Havtfhurttt R. J. The 
McAMiFeoifQt of Mcnul Systcim (Caq 
ItiieUif^nct De MetaurtdO* Sciini, 
M9fUfUyy 1 4ii Vol. ^i. Pp. U}i6. 
(7) Goufhy H. G. A New Dimrniion of 
Sritut: I, Dtvtlopmrnt of ^ Penoa* 
ality Scale, dnurr, SpcioL 194!^, 
Voii 13. Pp. 401*409. 



\ 



Influence of Culture Background 
on Test Performance 



ERNEST A. HAGGA *.D 



O 



IpMMMMIM 

ERIC 



In considering the topic of the influ- 
cnoe of culture background on test 
performance, I will limit myscH this 
morning to some of the factors which 
influence the mental test performance 
or children in our society. In this con* 
nection^ we are all convinced that how 
well a child docs on an inielTi^nce 
test is a function/ in part at least, of 
a complex of genetic factors, Buc there 
is a good deal of disagfeement on the 
extent to which they are reflected 
5cor(^s on our present mental tests. 
In a strictly factual sense, no one can 
say. About all that we can be sure of 
is that there is no conclusive and defi* 
nite relation between inheritance and 
performance on mental tests in terms 
of such variables as, for example, socio- 
economic claia. 

This has been, an J still is, a focal 
point of mf ny ardent and heated con-* 
trnversies. But hecajuse of the great 
number of genci that underlie the 
inheritance of the higher mental proc- 
eivis, and becflufw: oi the Lrge number 
of generations of d^k^rete stratified 
Mmplff that would be necessary to 
(if monstrate the specific inheriunce of 
5m h proressfs. and bec;:u5e of the 
relatively rapid movement of ind'vid- 
iiaN up and down the socioHrconomiV 
Udder ifi Amerlia^ it seeins r.nlikely 



that any of us will see an empirical 
solution of this problem. Nor may we 
find a solution in the past. In (he last 
few centuries in western European 
culture,* there has been no stable strati* 
fication of intelligence in terms of' 
socioeconomic class, because of social * 
upheavals like revolutions and wars. 
Probably the most stable group is that 
of royalty, the topmost level, but it is 
questionable whether this ^out> hu^/. 
distinguished itself for intellectual . 
achievetnent. The theoretical and 
(Mathematical solutions arrived at by 
such geneticists as Haldane, Hogben 
and Huxley indicate no demonstrable 
difference in the inheritance of itiental 
abilities in different socio«econnmic 
groups. 

In addition to genetic factors, indi* ^ 
vid^ials also inherit, in ajaense, a physi* 
cal and social environnieht. The effects 
of these may be tested. Fof example, 
from studies with animal and human 
subjects, we already kno\v something 
of the effects of serious nutritional 
deficiencies on the development of 
neural and other bodily tissues, and 
their impairment of later adaptive be* 
havinr. Such early deficiencies and 
weaknesses may also lay the ground- 
work for various debilities (n later life. 
The recipients of such haridicaps are 



[18] 



1'^ 




TESTING PROBLEMS 

t 

i haractcristically found in the luwer able performance on our present type 
socioeconomic groups. If, then, a 
lower-class child were perfectly "nor- 
mal" at birth — he still may,* because 
of various factors which impede nor** 
mal development^ actually be sub- 
normal by the time he is of testable 
age. But perhaps more importcnt is 
the emqtional deadening, the develop- 
ment of mental callouses, the dism« 
tercst in lifei and lost of willingnfts 
to respond to it| that often accompanies 
severe deprivation. Little careful work 
has be^n done in this area, except 
perhaps for some of tht studies of 
institutionalised and rejected chil4rtfh» 
and some of the foster home and twin 
studies. In any argument referring 
back to inheritance, however, such 
factors must be considered. 



of intelligence tests than is the c?^ 
with lower«clais children. 

The thesis here is that we cannot 
assume the various subcultures in 
America to be comparable, simply 
because of a common geographical 
boundary. Nor (s it enough to say 
''We coMder all that When we evalu* 
ate the IQ of a lowerdaas or ethnic 
child.'' Why not? BecauK all too 
often jhe educational opportunities^ 
from the e^rly grades on— are de« 
termined by how well a child does on • 
6ur present ilandardixed tests, regard- 
less of whether we ihtend them to be. 
And those who do poorly at first aie 
often given'^ferior educatbnal op* 
portunities, so that a iricious circlt is 
set up, and a great deal of potential 



Everyone is awite of the. influence abOity lost to our society, 

of the «Kial or cultural enviconment Furthermore, tesi-c^nstnictorscaif* 

oh test performance. No one would not, by themselves, design ^tesia which 

think of giving a.': intelligence test 'are equally hit to all children in ouir 

standardized on American chSdren society. One reason for this is that 



to a child ^'n Bali, or France, or South 
Africa-T^tid expect the results to 
mean«rery m ich. No one would give 
such a test to a child on the^other side 
^^of the ocean, but few haire accepted 
the fact that tesulu. of sych testing 
might be invalid if it is given to a 
chiKl on the other side of the tracb. 
This is a basic point, and the source 



they, theoiselves, are middk<lais indi- . 
viduals, buttressed by middle<lass ex- 
periences, ways of i^inking, and Ian* 
gitage forms. It b not turprising, then, 
that they construct tests which arc 
saturated with middle-class vocabulary 
J^nd language forms, and t^at the ex- 
periences ^and knowledges tMted are 
those with which most middle-class 



of much misundersunding and faulty children are relatively .familiar, *and 
"inft)rmation." Again, in termn of any that they often standardize their tests 
genetic argument, one must coQsider in terms of sucl)^ middle<lass values as 
the fact that children from privilfrgcd academic achievement. The problem 
(or middle-class) homes receive a then is whetlier these customary pro- 
range oi experiences ^and acquire a , cc^ure» ; re really appropriate for test- 
range of motivatioos which prepare ing the mental abilities of children 
thrm much m^re adequately for favor- reared in a lowp r-claw culture. From 



[19] 



1949 INVITATIONAL CONFERENCE 



all wc know of the wide range of correlation with some criierioni at a , 

iliticrcnces ^between socioHfConomtc given age level. On examinaiioni a 

' ' ^ ^oiips in America, the aniwer^t No, great majority of items found in the 

The way to approach such a prob- present intelligence tests is biastd inj 

lem is to find out what the lower-class favor of the middle<lass childi and, 

/ . and fthnic cultures in our society are against the lower<lass child. This is 

/ like. Tl)e best technkjues for finding due largely to the particular Qrpe^of 

• * out about other cultures is by Srst informatbn sampled, the vocabulary 

^ ^ making anthropologkral and sociologt* and verbal forms used, u wtU as th^ 

,\ cal AtU studies. 'A sizable numtier of artificial and academic nature of many 

such investigations have been nude of of the item types themselm. An ad^^ 

various social "and ethnic groups in ditbnal difficulty with many present 

Amcrka. Allison Davis, and the people tests is that they i^re standardized on 

working with him at the University a somewhat biased sample in temfu 

of Chicago, have made use of these of the total population. There are 

findin]^ in their attempt to construct more middle«cUss children in school^ 

culturally-common intelligence tests especially at the older ages, it is much 

for American children. It is manifestly easier to motivate them to take the 

' impossible ' to have a culturally*free tests, and in general they are m6re 

testyiince symbol systems must be used, cooperative and pleasant to work w^. 
« ^ ^<-*f.-^ ^ but we feel that it b possible to base . But if the testing conditions, the ttcims, 

a test on a range of experiences whkh the test as a whole, or the sample on 

is sufficiently common to all c|iildren which they are standardtaed art biased 

in our society. in favor of middle<laas children, then 

But before deKribtng some of qur it follows that lower-clus chi^liren 

procedures and findings, I would first would necessarily obtain lower *|IQ'' 

like to indicate one or ^ two possible scores— even though they really were 

effects stemming from an incomplete u gifted. \ 
^ awi^reness of the many differences The research program of Da^is and ^ 

^ between, say, lower<lass and mi4dle«- others at jphkago is directed tjbward 

" ' \ classgroups.Generally, itseemst^at it the djsvelopment* of individual and 

'^/^ ' has too often been assumed that/a test group intelligence ,tests wMeh are 

would yielii an adequate ^'me^re'' maximally fair to all sodal-dins and 

of 'ntelligence because of tK|^ sheer ethnic groups in our society, that is to 

elegance of the sutistical computations say, a culturally-fair intelligence test, 

involved. Such procedures are neces- There have been roughly ft^r steps 

safy, buinare not a sufficient/justifica« in this research program. / 
tion for an intelligence test. One must The first step involved extehsive and 

also examine carefiflly wha^ each item intensive anthropological fie^ stu<1ies, 

measures, as well as the proportion in which middle«class and lower-claM 

and distribution of subjects passing an ethnic and white groups were studied, 

item, or a battery of items, and the Some graduate students lifed in the 

[20] ^ I 




ERIC 



TESTING PROBLEMS 

•> ft 
hornet of lower-cUn Umilict far i«t- difference. In a later and more uitttt- 
cral monthi. Chfldren from all groups live itudy, 656 chUdren took part in 
were studied in their homei, school- a fiv?-diy experiment investigating the 
rooms, and neighborhoods, T^esechil- effects of the following vari||iles on 
dren were interviewed oii how and mental test performance: social<lass> 
on what the/ wftnt their time, their practice, motivation, the form of the 
range of experiences, how th^f used test, and its manner of presentation, 
words, what words meant to thero. The complete set of findings is too 
how thty solved test problems, and numerous and complex^ report h^re, 
what things were important to them, but there are a few which are es- 
The second step involved an ex- peciidly worth mentioning. Between 
amination of the relative performance an initial test and a reiest (with tl\ree 
of tome 5,000 chfldren on each of practice periods on similar item types 
tht 460 items from ten^frequently used intervening), (a) the loWer-class chil* 
intelligence tests. From these data at- drtn showed u great ah overfall gain 
tempts were made to find out why in performance.^ as did the middle- 
large discrepancies existed in certain dais chfldren t that is, they learned or 
items and ium types. It was fre<|uently- profited from their experiences . Is 
due to artifacts of middle-class verbal much as the middle<lass chfldren, ^b) 
habits, or differences in background the lower-class chfldren, when h.^y;hly 
experiences, or differences in ipodva- motivated on a retest of standard-type 



tion to do well on the tests. It further- 
more appeared that if the tests had 
been eqaially fair for chfldren from 
extreme social-class groups in terms of 



intelligence test items did significantly 
better than the lower-dass chfldren 
not thus modirated, (c) many items 
were revised to remove the middle- 



familiarity and motivational factors, the class bias, and (d) the chfldren from 

wide performance differences would both social-class groups profited more 

have been greatly reduced or wiped from the experimental conditions with" 

out. which they were more famfliar in 

The third step involved an ex- terms of previous experience, training, 

perimental demonstration that such etc. But tven though many traditional 

factors as di erentiallamfliarity, moti* item types can be reworked to be less 
vation, and the removal k middle'^ discriminating against lower-clals chil- 



class artifacts did significantly red^e 
the difference in the pet forman^ of 
lower-class and middle<lass ckudren 
on the same mental test problems. In 
a pilot study, it was found that the 
differential performance of tht two 
extretne social<la» groups could be 



dren, without violating the essential 
nature or diilBculty of the item, it was 
felt that in p:eneral they were too 
acad' mic and artificial, and that a 
new approach should be taken, with 
the development of items which are 
not only fair in terms of the back- 
ground experiences of all groups, but 



modified almost at will— either in 
terms of increasing or decreasing this equally motivating as well. 

[21J 



ERLC 



1949 INVITATIONAL CONFERENCE 

^ The fourth step, which it being car- age diitributioni 'were obtained, at 

ried on at present involves the con« ^vere correlations l>ctween this test and 

Mruction and itandardixation of a new KhonI achievement scores for each 

battery of individual and group intel- socioeconomic group. The magnitudes 

ligence tests for children of ages six of these correlations are comparable to 

to nine inclusive. Since this step is those reported in the literature for 

only partially completed, I am not the present intelligence tests , 

able to desctibe the specific tests. How- In conclusion, I would l(ke to re- 

rver, according to a sutement by turn briefly lo the question of the rela- 

Davis and Hess, the item types used tion between the inheritance of mental 

in the new tests include: the under- .bility and soaWcmiomic differences 

standing of physical principles, the ,„ intelligence test performance. In 

classification of objects into categories ^^is connection, two points seem rele- 

selccted by the chfld, memory pro- J^,. inteHigenc. test 

ces.es, the drawing of mferences from ^^^^ .^^^ j„ themselves, irrelerant 

given relationships, cnf^ processes . ^ ^ ^, ^ 

and the ibuity to venfjr solutions, ^ i. • . 

1 . . . ^ . . ; . ^ of genetic theone) are coiKe/nedi and 

general inductive and deductive rea- « . , 

soning, and a number of others. The °" P^"*"" ? 

item, themselves involve problems and ^''J'*'*" •oc«^conomK 

problem types which are about .Iqually * ^""'^ ^.T* "^^"^^ 

common to all socioeconomic groups '**P'"'''"i °" .peafic school 

-hut problems which are nbt Uught n°t find the cu». 

either in the home or the school. " <»»tribution ot mental test 

Consequently, they have to be solved °' «o<:>o-«conomic level, 

almost entirely by the chifdV reason- Therefore, the burden of proof for 

ing and creative abOity, m it may be demonstrating that the upper socjo- 

dcvcloped by his general experience. economic groups inherit a complex of 

The group ttsts are not completely gene characteristics which are tied to 

analyzed as yet, but findings from the wperior mental ability, and lower 

individual tests show a lack of differ- locioeconomic groups inherit irtfepor 

cncc in the average performance of genetic structures, rests on the shoul- 

chililren from extreme socio-economic ders of those who interpret such differ- 

grnups on these non-academic prob- ences in mental test scores as being 

Icms. Yet at the same time, adequate due to differential inheritance. 



I 22 J 



If 



Influence of Cultural Background 
on Test Performance 



WILLIAM, STEPHENSbN ' 



By cultural backoround, how- 
ever defined, one refers to hiicoriadly-\ 
rooted^matters. A whjte-coUared Eton 
Kholar and a back«itreet Brooklyn 
boy appear to be diitinguiihable^ and 
yet also indisdnguishi|ble, in terms of 
their cultural backgrounds. Their sen- 
timents^ hftbit% attitudes, and affec- 
tions may wtU be very different Yet 
^th speak English, and both live with 
a common heritage of liw, religion, 
customs,, and much else besides. They 
are educated differently, yet the ideals 
of Ancien^ Greek and Renaissance 
Educators penetrate into the schbols of 
both. In comparisoh with a Hindu, 
however, or a sedate Chinese boy for 
whom Taoism 'Was a background 
until Communisni burst in upon hiro^ 
Europeans seem culturally different. 
And still more diverse must be the 
culture in and against which the native 
^ Afncan child lives, or an Eskimo. 

It has bAn difficult, Iiowever, if not 
impossible, to formulate concrete and 
pperational postulates about such cul- 
tural Agglomerations. It is permissible, 
perhaps, to distinguish between (a) 
educational influences, (b) socio-dy- 
namic situations, and (c) the vague, 
historically determined culture pat- 
terns which, when evaluated, we grace 



with the, name of heritage^ and with 
which we are here to be concerned* 

One might ^ve. thought tha( **eul- 
ture" p^chologms, whose very prob- 
lem wtt to 'interpret these latter his- 
tprical trenda along the lines laid 
down by their men^r, l5ilthey (i), 
would have provided something for us 
to bite upon, scientifically, by now. 
True,' they produced an Oswald 
Spender, with his notable of 
th$ W$tt^ but X know of no tesii^e 
hypotheses that reach into Spengler^s 
Mayan, Babylonian, Oraeco-Roman, 
or any other ^^civilisations," Yet in- 
teresting matters are at issuer We 
know 'that the ancient Athenians) after 
the Persian Wars, created the Euro- 
pean mind out of a mere hahdful of 
human beings and a few square miles 
of territory; and our own Eliaabethan 
Gol4en Age, after a hundred y^ars of 
war^ was bom iilmost within sight 
of London. Moreover^ whereas the 
Greeb and the Elizabethans Called fpr 
the richest development of a man's 
ability, the current trend is 
rather to foster the trickling spedalties 
and presumed aptitudes of our young. 
So that perhaps culture determines 
very largely tc/Aaf abilities we shall 
value and develop, rather thfn any- 
thing else at tisue. There are stroiig 



C23] 



1949 INViTATIONAL CONFERENCE 




suspicions that this is the case, as 
sociologists such as Mannheim, for ex- 
ample (2), have already'suggested. 

Up to very recently, however, test- 
ing, such as we are to consider, hu 
hinged upon a null hypothesis. This 
is to the effect that cultural differences 
have little or tio effect upon some 
really important dimensions of human 
personality. It is implied thatlhere may 
t|« only a few such dimensions, perhaps 
ortly one, or two, We find the hy« 
pothesis almost unexpectedly, wher* 
ever we turn; at bottom it represents 
a belief that there must be general 
laws of personality which tt^anscend 
cultures— and by lawt I mean theories, 
or i^nthetic propositions as Kaufmatin 
and many modern philosophers would 
call them (3), which served models 
or growing ^ints for hypotheses that 
can be put to experimental test. This 
null hypothesis lies behind the search 
for so-called "culture-fret" irftelli-*^ 
gence tests: and indeed it would surely 
be imposing, if not important, if it 
could be shown that individikls drawn 
from widely different contemporary 
culture]^ such as our English, African, 
American and Chinese boys, are alike 
in ceruin important essentials.' 

If this null hypothesis has finally to 
be rejected, we may still, wonder 
whether there are on record any clear 
instances where important personality 
features have been ^^own to have for 
the main part a cult/iral determination » 
The pofsibilitics of /any essential t«<^- 
tfr^M^ standpoint, however, can per- 
haps be discounted} for it scarcely 
seems reasonable to suppose that there 
can be much interaction between an 



ordinary person and his cultural milieu, 
such that each influences the other and 
everything is relative to everything 
else. For the individual is sui;ely a puny 
tptck against his cultural background. 
Exceptions to this, of course, are the 
great men and women* of culture, a 
Plato, Aristotle, Buddha, or the like. 

What luive test performances, then, 
to say about these various matters? ' 
We should put asidf, I think, any 
CQosideration of studies relating to 
heredity, or to the influence of socio- 
economic levels upon test^wrfbrmance,* 
since these, exoept as controls, l^e 
^scarcely pertifient^ to the questions it 
issue concerning culture. 

Consider the null hypothesis first. 
One may begin hj wondering whether 
a Kinsey Report for widely diverse na- 
tional and cultural groups would read 
veiy differently in esKntials from 
the' Americjui. Or, if we distinguish 
between MtMng and mi$Uig0n€0, 
as Bartlett would have us do, interest- 
ing findings such as those of Car- 
michael (4) come to light. Using 4 
verbal-pro jecttve test consisting of lin- 
finishe<f newspaper editorials on con- 
troversial ttipics, Ctrmkhael showed 
that Cambridge ^4uates and Eng- 
lish workingKlass men .luid women, 
all alike, intelligent and unintelligent, 
argued . illogically, rationalized quite 
naively, projected and generally 
played havoc with anything' that re- 
sembles the orderly procedures of an 
intelligence test. Would not the ume 
apply the world b^yer? Or consider 
another example. ThemaHc Apper- 
ception tests may well mirfbr the 
immediate behavioral stresses, strains, 



[241 



ERIC 



.30 



TESTING PROBLEMS 




and preoccuptiont of different indi- 
viduals, and to this extent very oBvious 
social and perhaps culturally-deter« 
mined differences may be brought to 
light. But if Sam has trouble with his 
wivesi past and present, and Alex- 
androvic with his party afiUiations, and 
Nagawooli K^ith his goat- who is in- 
terested in these matters «s such? 
One nuy interpret the results^ of 
course, perhaps psychoanalytically, and 
so point to basic aflSnities of a dynamic 
kin4 underlying all these preoccupa- 
tions. It nuiy be shoWn in this way, 
for example, that chfldren in slum 
areas appear to have far|severer su^r- 
egos than chfldren from birtter-ta^do 
homer (5). But the psychoanalyst 
might well 4icmur about such an ap-^ 
parent result, pointing o^t that only 
superficial indications of psychoana- 
lytical ^dynamics are tapped by such 
tests, and that greater penetration 
might, rather, show eTeryone, of all 
cultures, alike in essential: thus, the 
psychoanalyst, too, beconfes involved 
in a null hypothesis for his fundamental 
postulates. 

Along systematic lines, however, the 
; best example I can offer is from work 
in the spearman School. This began 
with a distinction (made on theoretical 
grounds which were rooted ^*n late 
English A^Mciationism.) between no^ 
etit and mtoe^ procesKS. The former 
was represented formally by Spear- 
man's g-factor, and the latter by all 
manner of specifics and.group factors 
within the cognitive field of study.^ 

^ It ti one of the mi) contequences of a 

Surely inductive approach to factor work 
ur Burti Thuntonc^ and mott t< xi hpoks, 



Line nex\: showed that **visual per- 
ception'' in chfldren paralleled their 
mentil growth, that is, their mental 
age (against which, of cuurM the Binet 
tests had been validated originally). 
Stephenson (6), and Brown and Ste- 
phenton (7) followed by indicating 
that tests of this same visual perceptual 
material could be regarded as '^ure'' 
tfists of Spearman g-factor, with these 
noetic implications. Finally, Fortes 
(8), who turned from the London 
group to' occome ^an anthropologist, 
found that African natives performed 
this kind of pierceptual test quite M 
satbfactorfly as whites. Fortes, how-, * 
ever, was careful tP do what others 
rarely achieve in test constructibn: he 
randomized the varieties and styles ' 
of perceptual nuterial by selecting it 
from every, known culture, past and 
present** 

Now I make no chim that this se- 

refer to the Spearman Theory of Tyto 
Factors without reference to the experiential 
matteis and {-ycholofical theory that the 
factor theorems merely echoed, or paralleled 
u modeti. Tlitti, Speamvui merely wished 
to deny (he proposidon that group facfirt 
could be fdund in the notiic field 1 be knew 
full well that they, abounded in the m^oUic. 

* Stuart Dodd (9) attempted something of^^ 
this kind for pictures of common objects 
and situations, for his so-called international 
test of intelligence. But the materials and 
problems were rooted in MOiiic processes, 
and the test riiowed greater rather than less 
differences between racial groups, Simttarly 
the styles of the fundaments used in the 
Ptnrose-Raven mauices (to), ai|d in Cat- 
telPs,(tt) "culturt-free" test, or Penrose^s 
new perceptual^eo, are severely European 
and geometrical in form, and to this extent 
would be suspect wkerevf r the null hypoth- 
esis «MifiV s\|pported. They would be sy»* 
pect for other reasons, too, ^t I must 
leave this to one side for the present. 



[251 



ERLC 



/■ 



#1 





1 \!.> 



1049 INVITATIONAL CONFERENCE 



iiiwiiii[(iiiiw(iiiitnhrririp.ii I iiiinijidi 




ERIC - 41 



quence of events and its outcome was 
other than tenutive: it lacked the re- 
sources for test construction and stand* 
ardization that America now affords, 
t)r that the Educational Testing Serv- 
ice so elegantly devotes to its tests* 
But its theoretical •implications were 
clear, and obviously it was opentatcd 
towardsMhis null hypothesis. Mort^ 
oKttj I propose not to enter in^o the 
appraisal of such results as we have 
available about ''culture-free" or any 
other tests involving ui in this null 
h) [ othesis: there is some evidence, such 
as that of Fortes, supporting the hy- 
pothesis for perceptual dat^ aAd much 
purporting to reject it In the latter 
cases, however, so little has been done, 
usually, to randomize materials, or 
to take account of other controllable 

* factors, that the evidence is at leait 
dubious. I can only suspect t^t Fortes 
an^ the Spearman School were at least 
on the right lines to handle mainly 
visual perce|)tual material fori some 
kind of crucial test of^the null hy- 
pothesis. ' 

But now let us consider the other 
propo:4tion, that culture has a decided, 
even a decisive, effect on human per- 

' sonalir/. For most of us this may seem 
completely'' obvious. It is surely eas^ 
enough to bring different nation al at- 
titudes to light, as Cantrfl^ni) is 
perhaps doing. Here I would like to be 
pardoned for using my own expen* 
mental observations, since I believe 
that they Hre methodologically more at 
the heart of what is involved. 

I begin with the knowledge that it 
k the iyfi psychologists, the Sprangen 
and the Jungs, and sociologists such as 



Mannheim (a) and Fromm (13) in 
recent years, who stress the influence 
of cultural background on present 
personality, l^ut it appears that no self- 
respecting piychometrist, except my- 
self, believes {any more in iyf0$f ttctpt 
as cuts aei^o|» a noirmal distribution, 
made for jtonvenience— much as we 
cut up the/I.Q. scale into moron, fee- 
ble-mind, /normaly and genius. Even 
so, J would $A you to re-^n the 
whole ,nrjitter of typeSi or at least to 
keep M;bpen mirid about tt for the 

^ next feif^ yean, for I believe tlie pay- 
chometrista have been baHdng up quite 
the wrong tree. Matters look very 
different if oMe approaches types from 
A Q-technique standpoint (14). 

It ts a simple nuitser, for eximple, 
^ show that more men in the United 

^ Suies are likely to be of a type X, that 
we might call ''extrovert/' than of a 
type Y, that we might call ''introvert^ 
The opposite is the case for women. 

, But the main types can be demon- 
^^ated for aiiy snull number of per- 
sons, for example for any ten of you 

/in this room, without operational refer- 

/ ence to any other ifersons in or out 
of the room. Indeed we can say some- 
thing about the matter for^only ^ns 

' person if heed be: thus, given a ''popu- 
lation** of 300 traits chosen at random 
from a Jungian univer^ of such traits 
(I have 2,000 traits in such a uni- 
verse), I might invitt^ the onsf person 

(a) !(> appraise himself witK the traits, 

(b) then, having done this, to give 
an account of what he believes an id«al 
introvert to be, and (c) Unilly to give 
an account of what he believes an 

^ ideal extrovert t6 be. The correlations 



[26] 



3* 




ERIC ■ 



TESTING PROBLEMS 

between («), (h)»«nd (c) for. N=aoo ttioni of the traiti may very well stem 

traits, will indicate whether our ont just preciiely into or from such his- 

person (if he is sophisticated like our- torically persistent strands, 
selves or college students) is of intro- At the outset he was asked merely 

verted or extroverted type.* to give a description of his own per- 

But for the moment we need only sonality in terms of the aoo innocuous- 

examine the implications of such Q- looking traits. He had no idea that I 

technique (indtngs, and its approach, was going to ask him, subsequently, 

for our preoccupation with culture, to describe-an ideal or typical introvert 
Suppose that, in terms of Q-technique, ' and an extrovert. Nor did the , traits ' 

types are now demonstrable (m indeed suggest that anything of the kind was 

they are). In the cAe of Introversion- likely to be involved. Clearly some 

Extroversion such types were rooted, kind of ostensible learning has medi- 

for Jung (15), dmost wholly in ated, and the culture psychologist was 

cultural background. Jung traced the perhaps quite co|rrert to trace this 1^ 

matter back into pfe-Christian' his- only injto current cultun (plus learn-, 

torjr; into the diq>utes and castigadons ing in an osiensfl)le manner), but also 

of a TeftuUian and an Origen of some to seek its roots inxultural history, 
eighteen centuries ago; into Schiller's The psychoroetrist, however, hu 

idealisation, many-'centuries later, of not sought tp represent such types but 
the "Grecian^ heaven"} into the map- * to measure isolated, perhaps a-histori- 

sive folklore and pdetry of a Pmut, a cal or immediate, functions or bctnrs, 

P«rtifd^ or a 2mt^»Mf \ and so such as introversion-cxtrovcnion or 
down into the very tough mindednesi the like^— much as one measures an 

of James's Pra/maAm. electric current. At best the result has 
Now it may stretch one's credulity, been not one function or factor, bu\ 
if not one's imagination, to accept the lereral, to judge for example frqm 
proposition that these same roots find Guflford's studies. One dovbis, how- 
their way into the personality of our ev*r, whefMer anyone feeU happy -bout 
oneper*.nwhosecorrelationshaveiut>«.f*««>'^ for ^ «*"y ^ "f 
been referred to. Yet dearly he op- e^^-" "J^"?' »W ^"^^"^^^ 

eratcd .with my little- test, and it is T,T'"* T''"'"^ ^^H" 
^ deed dtfFerent forms of ntMjm ,c*n 

proTide Trtther different tpparent fac- 
tors. 

The ntuation is ir«r)r differei\t if one 
seeb to r^^t^t types as such sutisti*' 
cally. For one cvi then operate with 
the types, (hat is, 'Object them to.ex- 
pfi^imental tests, even for only one 
person at a time* 

One can see the fashioning of such 

f27] . / 



not really difficult to see that his cvalu 

^ •Thoi, for tht '^followinir quite typical 
dit»> the person it venr likely to beJntro- 
vf fin! in type (or thinkt he it) : 





Self 




I<ieal 




(•) 


1(b) 


i(c) 






+ 50 • 


-ss 








-90 


1 










/' 



1949 INVITATIONAL CONFERENCE 



types, interestingly enough, in current 
Americin culture. Eric Fromm (/j) 
for example, in hit Mm for Him^lj^ 
offers a description of the supposed 
"market** type of personality, which 
he sKribes to Americans who appar- 
ently want to lell everything, including 
their own personalities. In ttrms of 
Q*-techniquc I have recently reduced 
Fromm*i notions to soma Uod of or* 
derly operational testing, and can read- 
ily demonstrate, and thus verify, his 
"market** <^unicterizetion of Aneri- 
cans* This, apperently, is fashioned by 
your culture. 

But what we prpve is that such-end- 
such men are efil# in type. It is quite 
another matter to test them for any 
underlying functions in terma of indi-^ 
vidual differcncei By the very postu- 
lates one uses, in the latter case, one 
throws away any possibiBty of achiev- 
ing concrete types fs such. 

In conclusion, then, cultural influ- 
ences can be bq^ught into full view 
in the tyfifU^iion of human beings, 
as Spengler, Jung, and othere down 
to Fromm have seen. I state It as a 
testable poetubte that any systematic 
quantification in terms of inldividilal 
differences (which we are unfortu- 
nately wont to reglrd, klmlost as a 
myth, as the exclusive concern of our 
testing procedures) cannot represent 
such typtficatton, and certainly is in 
no way needed for its achievement. 

As I see the issues, therefore) in the 
vy7 broadest manner I am prepared 
to examine the null hypothesis that 



cultural background is neut* t, or can 
be randomized, with respect to erme 
0^ our major psychological preoccupa- 
tions. These are functions such as no* 
ests, libido, and th^like. As an offfKcjt, 
it is perhi^ as well to remember i^t 
society also determines what abilities 
Will be valued, and what discounted. 
But by the same token it is now eiif^ 
to demonstrate that man^s personalis 
iff^s are fashioned very probaUy in 
terms of the culture in which he lives. 

XBPaaiiicas 
(i) Hodfss, H. A. WilAdm DUihtf : aa 

^i) Maaakeim, K. Afaa md lactoy. 
N.V^ it4f. 

(3) Kaafmaaa, F. M m MMBg f •f ^ 
S^tid Seimem. N.Y^ if44. 

(4) CsnaicM* MfkiiA J. •/ fiytktUtf 
(Om. far.), if43. 

(5) JsdMii, L. PkD. Tknk, Ufiminkf 
•/ Oa/srd, 194S. 

(6) Stepkeoeoa, W. /. Kime. hfcM^ 
■tlif Vel. ss. 

(7) Browa, sad 19H|ihiaia. W; 
Chapter Vll of Browa sad ThocMoa, 
Bu^mmti •/ Mimtd Mmmrinmm. 
N.Y., 1940. 

(S) Partes, M. PhJO. TM$, V$timfiiij 

(9) Dbdd, g. CmmmmHtrnd Grpmf 
Mmki Tmu. Pkp. /IWs, Priace- 
^ loa^ Uaivtfsityi if si. 
(to) Ptarose sad lUvea. BrkiiA J. Msd. 

Ftyek^ 1941. 
(it) Cattell/R. B. ^ Caltarc-Prct lateU 

ligence Tctl. /. t Jmc. hftM^ 1940. 
(u) Caatril, R UNESCO Studies, 1949. 
(t j) Pfocnoi, B. Mm for Hitmlf. N.Y., 

>f4S» 

(14) g te ph t asaa, W. («(# bibltofmphy ia 
Wolgtf D. fadsr Andym to tf^^. 
Chkaio U.P.) 

Ui) Jonri C. O. PtfcMogiijol TyfMS. 
N.Y., 1915. 



C2«) 

24 



i ■ 



p 

V 



Infjuence of Cultural Background 
- ' on Predictive Test Scores 



WILLIAM W. T.URNBULL 



For conTcniei.je in ftttaddng the coniiiring mbd, identific, ruding, 

.broi<l qucsdbn before tUi panel I ind^ibathemetical mattrial, rnpec- 

should lOtt to limit ray dimMMO to: tiv^. The verbal section included 

tests uKd for the purpoee of prediction, questions relating to word ipeaning, 

.In imposing this limitation, however, wfi|^ usage, and the like, in the form 

^ . I feel that I am not gr^atl^ itttricting oT^poeites, analogies, and definitions 

the , field of inquiry, ante in the final in coinple^ form. The second, or 

analysis most test scores derive their "scientific'* section, was composed of 

utility from their predictive signifi-* questions of the so-caXcd 

. ciAce. tehte science tfpe. The tiechnkil tn« 

^ If we consider ttgs whoit use is fbri«atkNi needed to answer them was 

trmklf predktiTe, two qusstioAS of not great, end for the most part intel^ 

' intecest are: first, when people of dif* gent scientili/ interest and alert ob- 

" feienf cultural backgrounds take the semtkm woul^ protre as valuable u 

^m^MHn'i^m^ ^-f^y^^-i^^ same test, how do their scores com- scientific trainu.g. The third ssct^ of 

ptre? Ai}d second, are differences in the. test conMted of paragraphs of 
leM of test perforaifmct of different rather ^general naturt, each fdlowed. 
cultural groups associated with similar bjr.-iiuestions on its copitent^ whfle the 
differences in the subsequent behavior * mathematical sectipn (the lAt section 
those scores were.eupposed to predict? # of the test) was designed to test nu- 
In an approach so the first of these merical reasoning, presiqiposing a back- 
questions, Henrf Chauncejr and I gmund of arithmetic, elementary al- 
carried out some yean ago a studjr (as gebrs, and rudimentarjr geometry, 
yet unpuUished) to discover the man« - The test was given in 1943 to over 
net in which students fron\ different 300,000 students sU over the country, 
geogriphical areas and different sites as a screening device for the college 
of communities differed in thefr per- training|progranu of the Army and 
formance on types of questions com- Navy. All of the people tested were 
menly UKd in tests of scholastic sptt- male, were 17-21 yelb of sge^ and 
tude. had reached or psised the senior year 

The test used war the first Army- of secondary sclboL^ 

Nsvy College Qualifying Test. This From the mass of answer sheets, 

• ^-^ • . . exsminstion tnduded four section?, eight subgroups were segregated, first 




1949 INVITATIONAL CONFERENCE ^ 



by taking alt answer sheets from the 
hmr regions of New, Yorjc State, Ala* 
hama and Georgia ronAined,, iowt 
and Nebraska combi , and Call* 
' fornia; and then by separating within 
each region the ansiwr sheets of stu- 
dents in large and in snuill communt- 
ties, A Urge community wu defined as 
one whose population \yas 1 5O1OOO or 
more and a small communttjr fts « non* 
suburban communijiy below 5,000 
in population* For a conwnknce the$e 
groups were ctlled^ urban ^ and fund 
respectively. (I^ill readily agree that 
these terms are not rigorbus» since not 
all students attending school tn a com* 
miinity of fewer than jiOOO souls CQme 



from farm homes, although a sub* 
suntial proportion of thenl^ do.) Ft- 
'-nally, from each of the efght groups 
(two sizes of community wtthm four 
geographical areib) a random sample 
of 500 answer sheets 1 )^wn« 

Please note particu]u.(y tb|t the 
samples were for from random or 
representative samples of the total stu* 
dent population of the age raa|e ly-i 1 
in the four regions. They' rquresent 
mefely the txtrtmes on a scale of 
"population siae^ wttbtn groMpi that had 
voluntarily taken the qualifying tests^ 
and there is no basis for ascribing 
rejiricsentativenesf to the samples. 

Thi^ main results of study 



aaUY-'NAVY OOttISC QUALNFYINe TEST 

auRAL-uasAN DiFPemmoss wt aeetons 



.4 

OtviotloA 

0 

Totol 
(iroup 

-.1 

*A 
-J 
«A 
..7 





t 1 


1 M 1 




Y f 


M • 


V K M • V ft y 1 










> 






























































— r 


















# 












- 




















M 

Tot( 


Mil of 

li Oroup ' 




































] 




















































— L 




— 
























































































































t.t«ead: 






























Urbon 


















-i 








mmmm 


« 










































V YORK 


ALA.-fA. 


lOWA-m*. 


OAtlPOfWIA 



TESTING TROBLEMS 



Wen pur in g;kaphic form, tnd 
I hchfvc ihcff Art lufficic^t copiet 
here for each ptntcux present Look* 
wig fir.»t at rhe ^ect headed Army- 
Navy College Qualifying Tett — Ru* 
ril« Urban Dilferencet by Rtgiont,** 
(*^'«g. I ) we tee fint t/ie tery con* 
^krui»us deprewon of dl values for 
Abh^fna^Gtorgia, particularly m i^he 
rural arraj (represented by the dot- 
it'd line). Sme the acale here u 
e< pre«ed in tenths of a tumdard de- 
viatkm for the total gi^oup^ it if evident 
that the rural Alabama-Georgia candw 
dates Kored ab<Att three^fourtha of a 
ttgma below their rural New York 
couiinv For the urban groupa the re* 
gional difTra ncct are leH striking^ but 



are still present, n'xt notice that the 
solid lines tend to slop^ downumd to 
the rigbti while rhe dotted imes tend 
to ilope yfwurd to the right This is 
«(en clearly from the phart where the 
compMitc rural*urban companion is 
made (See Fig. 2), with all four re- 
gtons averaged together. Evidently the 
students from Urge eommunftjcs were 
mui^h more brile verbally than tho» 
from small cmiiiiunities^ whereas in 
maihtmadcal abtUty their superiorit)* 
w^v^slightt and in terms of ability to 
answer common sense science ques- 
tions the two groups were equal. 

An analysis of variance showed that 
the differences in total test perform- 
ance accord- *g to geographical regic.t 



aanY-NAVY ooLtiii ouatiPYiNa tcst 
mmaip^ufiiAii DirreaeNces 













J 






Stondard 


J 




o( 


0 


Tolul 




Gfowp 


J 




-t 




- s 




•4 
a 





































































































L«9tfid 

MM Urban 
Rwfol 














7 



1949 INVITATIONAL CONFERENCE 

m 

were sunsticilly tigniAcantt M werf merely on the tjrpe of ten mttcriil, 

the differcncei in periormMce accord- ti oCcrucial (mportancc for the trgu- 

tng to me o( communiity. There were meht at to the cause of the dUfertncea. 

Agnificant differences between :ute$ For if the differences were common 

in the relationahip of the abfltty of to all iftim of OM type or one iactor, 

people from Urge communttiei lo that we might argue that tha dttferent 

of people from mall oommunitiea. groi4Miha4inliiritt4 different patterns 

And finfUy, rural-urban difftrences of abflWea and that these pottems re- 

raried agnificahtly' according to the* iectad thtasalvis diracdy in tha ttn 

kind of test material used| as illustrated ssctioi scores* But on< would scarcely 

in the second graph. ar i;ua for difftrential inhitfiiaiica of 

A$ a further step in this inftstiga* .4)ility to soIto indiridual questions 
tion> separate item analyses were com- withb such a homogeneous factor aa 
pleted for the eight sutfroupe on ten verbal^ where the dependence of rural- 
items from each test ssctmii in an urban differences on the partjpuUr 
attempt to diKOver Fl^^r the lower test qussiioo asked was mostyuasrly 
scores of the rural group resulted from indicated* The condurioo must be 
generally poorer performance on the thilt, whether or not there were in^ 
items within a giren tist section or herited niantal differences behretn our 
from failure on particular items. Ishall gr^ips from large and small com* 
nr^ take time to report in detai on munitiee^ and from etste to state, en* 
our findingSi but an analysii of ran* ?ironnmitil differencea must 'have 
ance showed that the kem diUkulty caused certain of dit dtlFerences in test 
diff^^nces betw^n the gr^Mips faikd performaAce: epedScally^ the inter- 
lagniScantly from one Item to another, g^^oup differences in order of issm 
That rs, the order of item diffculty d^Hculty within a iin^ test ssctjog^ 
was not the same for boys from imall If one gnutfs that some test <|ue»» 
communities as for boys from large tions are itdatifely^ harder than otfiers 
cities. In the caae of the verbal section for people in a ipedfled cuhursl group, 
of the test the varianca of item^diff- the next questibn is; what shall we do 
cutties by community sises was con* about it? Should we huttd tMts that 
•derahly larger thlan would have been minimiie the interrulturitl difference 
rei|uired for ngnificsncr at the t% in scores, or that maximise )iatdiffer- 
kvei of confidence, and for the other ence, or ihall wit trust to chince to 
three sections wai ngnificant st liettrr bring us out somewhere iii the niddlef 
than the 5% level of confide^tce. It is my contention that onla pre** 
Similarly, the differences in perform- dictivc teit any Kore difference be- 
snce of geographically diiTtnct groups tween groups wh,;;ie bac^tto^^ ^" 
varied tignificsntly from item to item, (er should be judged not good or bad, 

The fact that the differences be- not right or wrong, but useful or not 

tween groupi depend on (he individual uteful, valid or invalid for the pre* 

tett 'jtiestinni conwdered, rather than dictKin of f Jture behavior. We must 

IJ2J 



TE&TING PROBLEMS 




specify the criterion we with to pre- 
dict, and then jimify Mtergroup^yal- 
ity or 'inequality of teat acorea on the 
basia ol it% effect on prediction* 

Relatively little. attention haa been 
given to tKe question of the effect on 
prediction of score differences between 
Cultural groupi Tht re sulti of a few 
invesdgationa are/ivatUblf, and they 
ahpw in general that the rather hap- 
hazard mixing' of items feVorable to 
varidui subcultures has so far resulted 
tests that differentu^te usefully 



among cultural groups, if 6ne*s f^r* 
pose is to predict the criteria used in 
these studies. 

In a study shortly to be publilhed 
by Frederikaen Mid Schrader'a com* 
parison wu nude between predicted 
achievement and actual achievement 
of i^eteran ar * non-veteran Mudenta in 
their firat year of college* In four 
institutions a prediction of the fr^sh* 
man grades was made from acores on 
an aptitude teat, uaing the ACE P^^ 
chotogical Examination in three in- 
stances and the College Board Scho- 
lastit Aptitude Test in the fourth. 
Within each group, veteran and nor- 
veteran^ a division was then made on 
the basis of background variablet^ (or 
thr purpose of discovering whether or 
not they were aiaociated with a tend- 
ency to accomplish more in college 
than thr test scores predicted, If the ap* 
litiide icits were as^!gninj^ improperly 
low scorn »o students of lower aocio- 
fcnnomic stitusone would expect such 
«u(lrnMtoovrr-achifve (in relation to 
prrilictive test score) on the criterion 
variahle, whatever it might be. Such 
was not the case in the four institutions 



studied. Background data were avail- 
able on income of famfly head^ formal 
education of father, and atae of coin- 
mv iity. No clear trenda emtrg^ to 
show that the student's position i^tive 
to.thesa variables was related to his 
tendency to over-achieve, whether vet- 
eran or non-veteran students were 
considered. 

In the study of the College Qualify- 
ing Test whose results I have reported 
no criterion data were obtained. Such 
dau were, however, gathered in an 
unpublished study by Conrad and Rob- 
bins, who used the same qualifying test 
to predict achievement aftir two se- 
mesters of the V-in program. They 
then attempted to atcount for the 
errors in prediction on the basis of 
educational handicap in high achool 
using as the measure of educational 
hnn^icap the average teachci^s salary 
in the school system from which the 
individual came. The hypothesis tetted 
was that teacher^s salary should cor- 
relate neptively with over-achieve- 
ment: the lower the salary, the greater ' 
the excess of athievement over pre* 
diction. Out of seventeen (X>lleges 
studied, negative correlations wtre 
found in six, a xero correlation in one, 
and positive correlations in ten, show- 
ing that whatever educational hanfli* 
cap was reflected in the aptitude scores 
was reflected to at least as great degree 
in first year college achievement. 

These findings suggest that inter- 
group differences on scholastic aptttiide 
testSi when the grouping is based on 
factors usually associated with cultural 
or educational handicap, arr valid for 



[33] 




ERIC 



^■1 



1949 INVITATIONAL CONFERENCE 



1 



the preilictioii of college freshman 
grades. Admittedly this a limited 
criterion, hut the nature of Validation 
demands that we investigate our cri> 
teria one by one. 

The (imiings based on freshman 
grades were corroborated in a (urther 
8-coUege study reported by Conrad 
and Robbins at the 1947 meeting of 
the American Educational Retearch 
Ass«Kiation. In that study the authors 
found that errors in prediction of 
fift/t-trrm college work from aptjtude 
test scores were not related either to 
average teacher's salary or to s»m of 
community from which* the student 
came: that is, whatever handicap the 
factoo of teacher's ialary or commu- 
nity sixe may reflect manlfeMed itself 
as strongly in achieveflient through 
the fifth college term as it did in the 
aptitude scores obtained before en- 
trance fo cullege. 

I wish I could report findings of 
similar wudics aimed at longer-range 
trite .'"t o^ greater social significance. 
Unfortunately, however, I know of no 
existing data- that will help us answer 
the question of the validity, for such 



criteria, of the intergj^up differences 
under coiisideratton. / 

To summarise, thie study of inter- 
group differences on: the Army-Navy 
College Qualifying! Te«, reportad 
earlier in thia pap^r, Oluttrates tht 
magnitude of the score differcncct 
< obtained when one kdministtn a typi- 
cal scholastic aptit^t test to group* 
of high-school graduates in various 
geographical regto^u aiYd from com- 
munities oKdiffereht sites. The analy- 
sis of differences/ on individual test 
questions (tointt to! the casual influence 
of cultural differences in producing 
score differences. 'Other investigatkms 
have uncovtred/ evidence that such 
score differences! have some prcdktive 
utility: l.e., that/ when college grades 
through the fiftli term art accepted as 
a criterion, the ^ scores reflect accu- 
rately the pcrft^rmance of the varips 
subgroups. What we need in order to 
provide a more generally useful an- 
swer to tht quekdon.of prtdictive utility 
are studies in' which test scorn are 
used to forecak long-term life success. 
Only studies /of this kind can tell us 
how great should be the intergroup 
differences on predictive tests. 





FAETICIPANTi: 




OicAE K. Bumoi, Aknb AiiArrAtt, William SrsPHiitioif, Haeold 
GuLUKSBM, EiKftfT Haooau, Huoh M. Daviioii, WaLUM W. TutN- 

BULLi DoUOLAt E, SCATBi. 



Ck^man BuKot: Fine I wfll give 
the memberi of the panel m op* 
poitunttjr to nm any queitiom they 
have with the other membert of the 
panel. 

Da. AKAfTAii: I actually agree 
with what has been laid by nioet of 
the ^ehken. I would like to mab 
three points in thii connection: Fint 
of aU, I think that studies of cultural 
differences in test performance are 
extrsflMly important The soit of 
study that Dr. TurnbuU has just re- 
ported and that Drs. Haggard, Daviii 
and Havighurst have done on how 
cultural groups differ in test perform- 
ance, is vefy important in helping us 
to understand what the existing differ* 
ences are and to what extent cultural 
background affects performance. 

I think, toO| t!:at studies of cultural 
similarities by such tests as Dr« Ste* 
phenson mentioned, studies that are 
using tests which are culturally neutral 
or culturally randomiaed and which 
enable us to focus onr tttention, there- 
fore, on what these cultures have in 
common, are also important. 

I believe, however, that such studies 



are quite apart from the problem of 
constructing tests. When we construct 
tests, I would agree flKimughly with 
Dr. Tumbull that the criterion is the 
only thing we can go by; As for the 
uss of tests for purpoesa other than 
prediction, I stiD say that we must 
provide an operational definitikMi of 
what wa are testing. And unless vre 
have s/i operational definitioo in terms 
of a criterion, I do not know what it 
would be. 

Da. SriFMimoN : I always like the 
chance to say an extra vrord, il I may 
say just one. 

I should hate to think that I leave 
you vrith the imprsssfon that technical 
matters ars of n^ consequence. Clearly, 
if you are maldng tests for prartical 
purposes, thess are important I would 
merely like to support pr. Tumbull 
to thst extent I found results simiUr 
to thoss described by Tumbt il for the 
Britiih Army and Air Force. When 
I took V and O and K tests, the G 
test did not differentiatr men and 
women, for instance, in the armed 
forces^ on some eighty thousand sam- 
plesi but the V t«ft certainly did so. 



135] 



ERLC 



:5\ 



1949 INVITATIONAL CONFERENCE 

The K test wis ip baiily done by dieting vtriout criterit? G>uld you 

wom^n that it was alntoft unbelievable, give a summary of such findinl^ 

I therefore know that these facts please? 
are there to look for, but I should still Dr. Haooard: Yes, but the data 
be wondering whether I shouldn't plan thar will enable me to answer your 
it out in some way, you see; why question are just coming in, and are 
should I be looking for juM V and R being analysed at preeent, m mf 
^nd S and M? There should be some marb must be somewhat p^wtnl. 
maia dimensions involved, and it is As one measure of validity for etch 
there that I think a little theory rather sodal class group, we yard t reading 
thtn a mere Kramble would, perhaps, achievement test, since it was the one 
help. test |ivtn by the school system to alf 
. Otherwise, I have nothing but ad* the children in the et^dy. Our indi- 
miration, u I said, for the elegance, vtdual intelligence test predicted 
and so fonh, for which all the techni- formance on reading achievement as 
cal mattert of tHt construction are ac« well asi^or slightly better than, the 
^ coun:ed. I still hiive my problem that Kuhlman-Anderson and the Primary 
thevc are some very big inues, Ukt the Mental Abilities tests for each social 
one of the Greek culture ihd the class group. 
Golden Era) they are a phenomcihon ' The use of such criteria for deter- 
that hair happened, and it would be mining validity, however, has ivjt 
so nice if we could find something been our main interest Rather, we had 
that would alter things now so that in mind mi approach which may appear 
we might have another Sort of kn\ ^ a little naive — namely, rehtncc on 
that would seem like a completefy face validity. In other words, we 
fantastic dream, though, I know. selected itetm which met the criterion, 

CHAiiMAW Buioe: The meeting is ^This test item should be passed by 

open to questfons from the floor. a smart boy and failed by a stupkl boy, 

Da, GuLLmnw: Mr. Chairman, I regardless of his socKHeconomic sia* 

wu interested in the emphasis on tus.^ The dedsion fpr the inclusion or 

validity from the speakers, and I want exclusion of an item was oiade by a 

to ask Dr. Haggard a question. I group of qualified experts in such fields 

thought I defected one sentence in his as education, psychology, and anlhro* 

talk that dealt with the question of pology. 

validir/. To what extent h^ve your If, for each social class group, our 

studies jealt with not only the validity tests can predict school achievement as 

of your tHt for different socio^o- well u present tntellige^icf testa, it 

nomic groupe, but also the validity of may be said that we don't need to 

the older tests whkh you are criti- worry, But in saying this, I mean to 

ciring? And how do these validities place wony in quctatkm marks, be* 

of the two types of test compare for cause we don't believe that such a 

different ^onomic groups, for pre- measure is n very meaningful criterion 

C36] 

1-^ 



TESTING PROBLEMS 

bffdidity although it « about all we people have grown ancc public ichoo' 

have if we mutt rely on such objective day* tni there may be a change in ih'^ 

data. We don't like it becauee the coming youth, 
uwal criterion meawreii heavily biaied There waa a recent study done in 

in favor of «iiddle<Ia« chfldren. In Kentucky, in Caldwell County, I be- 

tfitt of thia, we fourd correlatioM lieve, and their picture looked tome- 

around .50 between our ttM and thing like thii graph of the Alabama 

the meaaure,ofKhool, achievement— and Georgia group you have her*, 

which was about the tame as that for Maybe you can caU it a matt^ of ^ 

the ttani^ariaed teat! with our wbjecta. culture, but*I think perhapa it ii a* 

A correlation of thii nc, however, nutter of education more than culture, 
leaves enough variance, unaccounted D*. TowiauLL: i would have no 

for 10 that, even though our te«» do quarrel with that. I think I feel there 

not correlate with locio^omic tta- i, „o basis in the dat^fbr separating 

t«s,rthey may ttiU correlate » highly education from otherTultunU aspect^ 

Jl^r^ ^^ZJ^' •"•yl^caumigth.differ.nc.slbundin 

sooal datt b.» » mimmlted « ^ graphs, that the brighter ttu- 
our tetts, they actually provide a- more ,LT«i • ^rL ■ « 

»..ri» —in ^ t ' M • from the rural areas m Alabama 

nearly valid measure of uiteBectual .„j n—, -t. 

and ueorgia may not, for some reason, 



'"cT-uassAM Buaos: Are there any •""JJT^^^'^'i.S 

other mieidomr qualifymg t.jL Differential selection 

Da. OAnsoN: I should like to ad- P"**^' ^P*""'^- Whether^or 
dress thcsa remarb to Dr. Tumbull. * <lirtction, I am not 
I think his graph is very interctting. P"f"^ ^ 
but I wonder if it does not show that im|»e«ion that these re- 
New. Yorit and California are ttrohg ®' " repwaantingthe 
school syttems racier than the ex- «'«««**>nal systems of the four re- 
istence of a dty-rural difference. Also, distinction is blurred 
theie is this possibility, that in New ^ "I'-wlection that took place 
York you havis proportionately more '"'ore the tests were given, 
centraltied school districts, so that you Scatbs : May I say a word witK 
really do not have a typical rural respect to Dr. Anastasi's emphasis on 
school situation; Furthermore, Ac cu^ the correlation with the criterion as 
riculum ii clear in New York and a measure of validity. I cannot go a^ 
Cab'fornia, maybe more so than in the far as she does on that point. tVr must 
other states, although I believe the recognise that the criterion itself has 
southern states arc picking up now in some "bugs" in it| that the criterion 
the matter of what they Mach in the is jutt aa difficult to define and some- 
tchoek However, these data have times more so than the trait in which 
gone back into the patt, because these we are immediately interested. I know 

[37) 




'a. 



1949 INVITA-TIOl^AL CONFERENCE 



' it it a ^ery simple, neat, ind> in the 
ab$rract| a logical thing to uy that we 
wtll set up te«u which correlate wcU 
with a criteiKon, but usually tlilii cth 
terion is no|the well defined thing we 
assume it is when we glibly refer to it« 
. The reKarch procett operates in 
both directions, nor just one. The tests 
we set yp frequently serve in helping 
to redefine the criterion by throwing 
light on its complex structure. They 
may prove also to havf various forms 
of utility of their own and recenre 
justification in part for this reason. 
In the case of intelligence, we are not 
trying to obuin knd define a trait solely 
to predict any one particular thing; 
wc are trying to define, and gradually 
refine our concept of, a tratt because 
we are interested in a workable concept 
of intelligence itself. In the proce« of 
thus esttbltshiog a useful trait, we de- 
sire. to know -its many characteristics 
such as correlation with various things. 
Soite of these things may be regarded 
more or less as criteria. But the trait 

^ bf ing measured has rights of its own, 
and correlations with various cj^ria, 
while furnishing indexes of certain 



Aitilities, do not constitute the sole 
measure of th^ essential nature of the 
trait being measured. 
CHAmMAM BuRos: Do you wish to 

^ reply to that. Dr. Anastasi? 

. , Dr. Akastasi: Yes, just briefly, I 
would Jijke, to uy that I fully agrei" 
that the criterion has '^bugs*^ in it, and I 
wanted to emphasTae just that When 
the criterion has ^^bugs** and we vali- 
date a test mgainst that criterion, then 
the test has those **bugs,'' too, and wt 
mult not forget thartht test has ihtm. 
If we call it an intelligence test, that 
label will no| eliminata the **biigs.*' If 
we defifie intelligence and then forget 
that^definttion in ^ the process o^ con* 

^fistructing the test or ^validating it, 
we have not thereby eliminated the 
''bugs,*^ That is just why I want to 
focus attention upon the criterion* 

Dit. ScATis: To accomplish this 
end we must give attentioft to the 
*^gs'^ in the criterion as well as those - 
in the trait We cannot properly place 
our emphasis solely on etd^r one or the 
other. * 

Dr. Anastasi: I do not know what 
a ''trait** means in such a case. 



P A N EL I 1 

Uses and Limitations of 
Factor Analysis in Psychological 
Research 




55 



Uses and. Limitations of Factor Analysis 
in Psychological Research 



QEOjilGE K. BENNETT 




Although the title of thi| mornin^'t 
pwel diKuaioii n 'The Um and 
Limitatmu of Factor AaiI^ b P^. 
chologidd Research," I think I ihbuld 
make it clear at the begiitning that I ^ 
am going to talk about only a limited 
porti^lh of thti topic. My concern will 
be primarily^ with the pn>duciion of 
useful test banerirs and the contribu- 
tion that if made to them bjr factor 
analysis However, I 'should lika to 
n^ntion briefly some general notions 
about factor analy^js^which seem to be 
perdnent to this particular .laolication. 

To many pet^c out of uie great 
appeals of factor analyaiB is its appar* 
ently solid foundation in mathematical 
theory. To a certain extent this is a 
nlid belief. The problem of foctor 
analysis is, from a geometric Tiewpomt, 
the problem of finding the minimum 
number of reference axes needed to de- 
scribe a distribution of acom. ^Vhereas 
the original references are the tests,^ 
the new references are the fectors, and 
there are to be fewer factors than 



^ tests. Thisoproblem is clearly raathc- 
fflstical in nature, an<t to seek its sola* 
is conshient with the sdentMc 



tHWl 



principle of parHmony. Howe ver, once 
the referenct axes have been delerw 
mined, the proce» ceases to be mathe- 
matical. The identification or naming 



of facton anJ tha usr of the factors or 
th« rtaults of factorial analya* in^ 
practical psychological wutk h no 
longar a mathtmatkal problMa. From 
here on we are concerned with such 
qucsdonsartha extent to which tkal^ 
suiting factors have been kifluenccd 
by the compositfain of the inidal battery 
of tasts, the c «teristks of the tarn- 
pl&firom which dita wc^ obtained, 
tha applicabflky of tha results ttrocher 
groups, tha rioktiona of pqrchological 
tad mathematical tbcoriea in convert- 
ing the resMltt to practical and fea^k 
toting procedures, and finally the. 
rather simyla quesdon: Now that we 
have factorial results, wKat an we 
goiiig to do with them? . 

Coming to the actual process of the 
consiructkm of a battery .of tests by 
means of factorial analysis, tl^ f! ataps arc 
something lika this: Since no fectors can 
eventually be obtained which are not 
included among the variables initially 
ttudied, factorial analysis ordinarily 
begins with a large i^umbcr of tests. It 
ii desirable that these testa reprmnt 
tl« Urgfst posdftle variety of abil^ 
and furthermore, that each test be a 
reaaooaWy pure one. "Purity,'* in this 
^connection, refers to homogeneity of 
content and process. Inasmuch as rela- 
tively few ^pure tesis have been in 



C4!) 



ERIC 



I 



1 ^ 



1949 INVITATIONAL COl^FERENCE 




general use, the fuctor analyH oftm 
finds it desirable, not necewiry, to 
conitruct riew ^eits for this purpose. 
At we all know, lest construction is a 
time-consuming and expensive process, 
particularly when one utiltxes cofiTen- 
tional methods of itemr aiudysis to 
^construct a power test in which the 
items ha?e hi^ h correlation with total 
store and' are arranged itt order of 
riiffirulty. Furthermore, power tests 
usually consume extended periods of 
^ ttcv^e, and in a situation where as many 
as 60 tests are to be administered to 
each* subject, .the ,tot|d time required 
can reach prohibitive lengths. Consc- 
quently, we find the major portion of 
original test batteries in these situations 
consisting of highly speeded tests of 
relatively simple functions in which 
the score depends largely on the gmm- 
bejr of attempts made per unit of 
time. 

Aw*ter the matrix of correlation co- 
efficients has been obtained, the initial 
factor loadings are computed and; 
according to sonri^ Victor analysis, the 
axes should be routed so that the num* 
her of lero loadings is maximixed to 
that the factors shj(ll mkke sense, Al* 
though the initial facton by definition 
have no correlation with each other, 
the rotated axes often are not entirely 
independent; in other words, there 
f% some sacrifice of independence for 
the sake of improved factor' identifica* 
tion. If {practical use is to be made of 
the factorial results, it is necessary thmt 
the test battery be sbhreriated to man- 
;igeahl^ lengths. This usually means 
not over three nr four hours of testing 
time^ and the pressure from teachers 



and school guidance personnel makes 
even shorter times mere advantageoiM. 
This means that the orq;Mud bettuy 
of 60 tests must be reduced to a^much 
smaller number, say tsrdve or fifteen 
as a maximum. This invdves selecting 
ihose tests which, either mn^s^ in 
combination, will yield the best esti- 
mate of each factor. If we have as 
many as ax^factors, this means i^at 
, no mwe than two or three tests can M 
used to identify earh, unless e partku-* 
lar tesit is w^hted separately for dif* 
ferent factors* Since the correlation of 
lij^ilsr tests with the first factors 
extracted tends to be high, reasonably 
good id^ttfication of two or three 
faaors wfll ordinarily result, but some 
factors often havt rather loedinp 
in any test with subsequent poor esti* 
mation from any combination of , a 
small nupiher of tt|^ Although tli^ 
fectof^ are contlated only to a small 
extent! the individual lesirars osuaOy 
much more highly ctMrelated, and 
score combinatims from these tests 
equaUy so. This Ueds to the situation 
rejported by Crawford and Burnham^ 
among others in which the average 
correlation of Thurstone 's PMA bat* 
tery is reported as .361 whereas the 
Yale battery, not constructed on a 
factorial basis, yields an average inter* 
correlation of only .41. This would 
appear, to be a very small gain in 
independence of score for the far* 
tonally constituttd battery. 

A far more important defect, which 
is not due to the factorial process per 
se, is the substitution of 'factorial" 
validity I for real or practical validity. 
So far /as I know, the authors and 



142] 



/ 



37 



TESTING PROBLEMS 

publi«hcre of . lactorially constructed obtained. For exam|Uc, uh«t inttr- 

te« bstteries have been Mtii6*d to^r«>- pretatkm call one give to a acto ial 

port factorial validitjr and to jmply that. Kore on the baaa of auch a definition 

these are adequate aubititutetlw what at thii: 

they somewhat condeacuidingljr refer . , 

standpoint of usefulness to the counsel ahUiijr, pwhapi cootbtud with, loaciip. 

lor, factorial vajtditjr is • wboUjr in- nericil abiliix "od • certsm aawoat of 

adequate substitute, l^he coonsslor k desterii^, sad vtibal sbUity, with 

faecd w«h the necesdtjr lor tnaking a ^ l^^^^ J^-Jj 

senes of differential predictionk ui might be Unled with Jl!rt!!fM^ 

order to estimate the degree of success' or tempersmeat factor which gtv* mental 

and satiifiaction which his slient maf >ftir{Usi s liaglnew of direction sad 

expect in each of the several coune* f""*f^ »«7 •milu to the 

of action that are feasible for hi« to «^«t Wy.'- 

undertake. In order to make such statement was made in reference ~ 

decisions, the cqunsehr needs to know ^ unrotatcd fector, but equally 

she' extent to which his test scores are lUtements, in perhaps, fewer 

important to suoce» in certain school "^of^ have occanonaUy been nude 

courses and jobs. While the personi f g*rd to rotated factors, 

who have constructed factorial bat- brings us to the problem- of 

teries have not bieen unaware of this can be d<xie to make factorial 

need and have lisu'd occu|^«tions with more useful in terms of test 

which, in their judgment, the various •••ttery constr-^n. The fifst and 

factorial scores may be expected to '"<>st obvious btep would be to under-" 

have positive rvlationships, it is my be- > Kries of realistic validational 

lief that this ii the fiioMiest sort x^f con- Mudies to determine . the extent to 

jecture, inasmuch as no experimental ^hich each of the factor sfOres is pre- 

dau are brought forth in support of <lictive of success in various schol ic 

these contentions. If it is reasonaMe to cour^ and occupational categoriaa. It 

ekpect that the authors and publishers insf be that singly and in comK- 

of non-factorial test batterin' should factorial srnres have definite 

produce evidence of validity against wl^'ntages over tht nrores from a good 

realistic criteria, it appears also reason- battery of tests constructed according 

able to require evidence of the acttwl to more traditional prinqples. I ve'ry 

validity of the factor scores resulting much doubt that this wili be the case, 

from factorially constructed batteries, tut I am willing to admit the possi- 

This is particularly true in the case of bility. A much more realistic appUca- 

less well defined factors since often tion of factorial analysis would include 

these tlo not coincide even approxi- a number of criterion score* among 

mately with any traits for which some the variahles initially ttudied. If thesr 

evidence of validity has previously been criteria could represent a realistic sam- 

3> 



1949 INVITATIONAL CONFERENCE 

piiitji; of leveral quitic idifferent icho* faAoit whkk nmplify the coui(telor*« 

Ustk or occupatioffMil activiticsi the task and effect conwierabke economy 

'rewlting kno^ltdgt could have very in testing. tin^/On the other hand, it 

extensive significance for measureAient might indicate there.il no ^^t falue 

and perhaps for educational philosophy, in faCtoi' scorei But until wf try tbc ^ 

It if orobabie that educational situa* experiment/ we won't knoir tbe«ah«* 

tKms ordinarily do not offer the o|>* swer* 'to view of the feet that great 

portunity for obuining criteria of t||js aums of money are being ipent on 

sort for any adequate number of indi* educational experimental the cost of 
viduals. It mighti hoVever, be possbte ^ sucK»an undertaking would not seem 

to set up aa experimental school in *to be prohibitive, 

which several quite different typjts of Lacking the types of validational 

trainine could be offered within the evidence that have been brie^ sug- 

span of onr academic year to a large gested in the presedtng paragraphs, it 

number ot studenK Oiijecdveyuid is my belief that factorial analysis 

comprehensive' proficiency tests would has not demonstrated any uniqbe val* 

he required^ for each type of trainiAg ues in terms of test ba>tery' construe* 

50 that reliable and meaningful cri* tion» although it has given us som^use** 

terion scores could be obtained. A ful clues to mental organiiatioii' and 

factor analysis of {ests and criteria for has, perhaps, provided some reinfbrce* 

the notion 
r,*HJll ai 

specific situatiofts. It might be found several te«s ai^ to be used in combi- 
that some factor scoretare suitaUe for nation, low intercoi^relations are de- 
predicting a large number of criteria sirable. 
or it may be found that criurii^ which a a p Kit a n c E s 
are apparently very much alike require (i) Crawford, A. siKi..Banih4fa, P. & 
different factors for adequate predic- For^tmtimg CMsgi Acki09$mim. 

.... , ^^11 Ne«^ HAVto: Yak UaivtfiUy Prm, 

turn. Whether we use conventionaUy- ^^^^ ^ ' ^ 

made test* or factor Kores or projec* (i) Guilford, R Mmmd •/ liamt- 

live methods, we must still esublish . ';*'7 ^ Imffitmnm /er ih$ Gml- 

validity as the power to predict for g^^^jy hhu, Calif.: Sheridan vSup. 

ikpecific groups and for specific cri- ply Co.. 1947* 

'Tk'- I '^.*^#:.^*«»^:»iiff Tl) Wolie, Dael Factor Analpii to 

icfu, This type of experiment miglo^ ,,4o.fiKAo-Miri.«<,i«x.. i^Io^Vol. 

ultimately result in the extraction of j. pp, 



these Mudents would then result in mint of the notion *long ago str^d 
(actors which would have meaning in by K^lley^ *Hull and others that^ if 



[44 1 

51 



Uses and Limitatiegl^ Factpr Ahalysil? 
' in P^ychc^lopical Research 



H J 



JjYSENCK 



Factor analyfiw ha^ bfc*much criiiv crcncc, iniuiiioa, or oi\ the basb of 
ciztd by orthv)dpx ^tatmimritas well as co^i^on tente. 
by i(H<)pathtcairy-g[imded pjychologtitt,'^* It will be clear tha^ factpr imalym 
although fpr different tfnd frequently differentiated ^om All the'Mthodok 
opfHMMte reaiont . Th^ cnticttim often . procedOtet of suaturtics — detcrmiiuitidfi 
stem from itia^eqiuiteundersunding — ,of sirai%;ance of'differencas, pnalym 
inadequate understanding of the^a*- of vtfmnce and covarianccf, diicriiiii^ 
Mimptions' iavotvied and«t)ie suttstical function analym, aequentiaraAal. 

methods used on tht part of th'e lidio-. •"•^ •o^forih—b/ ihfc(iatt that 
paths,*' and inadequ^re und<iriUndipg ''**^htr^all thf cA^thodox procMy^&teaf 
of the purpoies Underlying its use on null -hypothesis a^garda^ difFer- 



the part of the statisticians^ As always, 
the uses and Itmitationk of a mathe 
matic^l method of analysis depend on 
the purposes which, ft is desig^ned to 
i^erve/In the case of factor analysis, 
there appear to be two main purposes: 
l) discover taxonomic principles 
in a fivid in which id little h known 
that no reasonable hypotheses can be 
set up and tested^ and a) to test de« 
ductionn made from taxonomic hy- 
pothrw in a field studied sufficiently 
to alli)w the setting up of promising 
theories* Irt botli caries, it will*he sreoi 



cnces between '^rrrtain *group^ which 
ire known a friori^ or ta4l6nf 0/ * 
previous . -experimental tnve^^tion/ 
factor analysis^atttmpta t<|/aiii;iwer the \ 
mtich more fundltnenial l|ue«tion: 
** What are the principles 4)f c|^i^ca* 
tion which obtain siT ti-^is* prtkrybr 
field, and accordion to whicfi ex^ieri* 
mental ||rot}ps oughft to be selected for. • 
the determination of stgntficaAt differ* 
ences?*^ 

This differentiatM>n linkfi up withi 
the fundamental proMeffi in mental 
testing, namel]^ that of validity/ W^ « 
must distingutih my i.Iearly between 



the pri>Wcm IS one of taxonomy ^r t)'pef of v^Uidity. which wt may 

clas».ficstM>n} factors are conceived as tentatively call lower-<Kd*?7Validiti' 

principles of classification which ^llow higher^rder t}lir*rtyV The usiiaY 

us to order our field of tiudy in a way textbook definition of validity as 

determined br jK ? properties of th^ ''agreement with a criterion'* refers to 

mtienil with w h ^v are dealing, lowef-order validity, and ts ewentally 

rather than in terrfk.;, ;ruil {fchv*pref- an engineering concept. If we Hect 

I«) 



> 



-^mmm^fmm^rmm ^^^'^ invitational conference 

f ■ simple rrkr^on, mch a$ number of im:m. tniiym il the. only panible 

M» *olilerc*". jper hour, or number ttmnt ol tha inurn*l<iiiidMi iirjr ap- 
o« ucideutt permonih, w« aw ewfly prowhj ftitmertly cWnifdtliwuthc 
detcrmuM the 'S«lidtty" of • given prettntitasc of dcvetopmchtofncnal 
itH by <arrrUtu)g it with the criterion. tcMmg prac^urca, no other method 
But while «iKh a detcrminetkm may k ewilehle wWth wift «$w«r the t»» 
h^w ^ cttmn amount of practkal onoroic, rl ii wllf e nw y quritioM whkh 
u«efulnei» in human engineering, iu ariw. Nor ■ it dumeatha^i lector anal- 
Kknttfic value ii 4iiiMrt {prvciRly nil. yA ii perfect in iie ptteent iormt 
t*he dUSculty iijl this conception of all of who havt uMd it on any large 
"validity** bbiought out dearly Jirhen ecale wffl agree that there »• many 
we eppi)' it to truly peychobgical cofto mpecti of it which require in^rove- 
cepa wch aa "ihtelligeiKe," or "eKtrm- rneni or even Anafk overhauling. But 
ver^,*' or "wggfetibaity;" Here r/e for the tjrpe of problem I have out- 
have either /Cwiterioo at all, « a lined, there dmply doci not appear to 
n. tltipl city of critcna which do r,ot^ be an aliemstivc, although that doe* 
cof k Ui*' very highly with each other, not mean Uurt we ebould mk go on 
We mutf therefore look fore cn'urion looking for one which ii fret of tlie 
to decide wliich of the many cristrii admitted diffcultiei attending the f»<- 
lo im-k a procedure which givce riac to loriiJ appiMch. 
-mMrfntiM. ■ rtfr^rt - i f an (nfinte tegrek of looking for cri- Two brief exampleewai illuett^ 

^ A. I epJii'Mi^^s**- JtrpruT.% ttrietodeddewhichofeeVeralcritem the uie of factor^ methoda m relation 

to UK in dccidiijg which is the correct to the two main purpoeel I mentioned 
trittfwi, end to forth. at the beginning. The fb« related to 

Once this atuation * iwa in which t>>e diaeovery of taxonomic prinqplcs 
• no cJear<ut external cnterwi isavail-' »» a field in which to Nttle ii known 
•We— end thie » the case in connection that no ktnaonable hypoth^ di can |ir 
with eteiy genuinely ||Jif«hoioi!pcal set up and ie«ed. In our early work 
concept I know of — w must have re- on Cactort ik termfaiiag ieiriMtic pief- 
courw lo aomf form of higher-order erencea we 'made an attempt to di*. 
concept of vflidity. Such a concept can cover the reewos underlying prefer- 
<v»ly derives from the adoptiort of the enceefordiffertnttypeaof poetry (a). 
i».«trntI-^on«itencf approach) in other 'Hie literature threw no light oi« this 
word«, u in every other ecie Kt. the problem, and conaetpiently a factor- 
, aoUted fact acqi Wce meaning only analytic deiign wm let tip. Some thirty 
m rvlatioit lo other facts, and inter- poema, each relatively thort, were 
pfeWte", 'neesurenfrtt.tndcoir tpn*- ranked in order of preference by f«r 
■U,i *rirn hecome puiwble by coordina* subiectil these ran' ige were corre* 
nje the iwbted facts in a ♦rutin cape- Uted and factor analysed. Tiro fac» 
»vlf of (unctvona! developir ent through tors emer with* *Jt rotation, whivh 
t the *Me of ihr hypothetKOMkdocnvf could be micrpreted nvf clearly on 

m«ih(jd ( I ). It • not claimed that the haaii of the poerts owst liked and 




ERIC " i H ^ 



TESTING i^ROBLEMS 

fwm di»liked by the tubjects h*vini one and the tame underlying variable, , 

high poiitive or negative Mturationi^ which might be identified with thif 
ftnpeciively on thete facton. The fim\ concepr of "Miggestibility." Inf cor- 

facmr divided thoae who like a aimple \ reladont were run between the eight 

rhyming scheme (ahd>), a regular, j teata, and a fattor analyv performed, 

evenly accentuated rhythm, and a I The analyiia showed thac twn lactore 

clearly defined ending to each line were needed to account for the ob» 

from thoae who like complex rhyming lerved correlation* within the limits 

•chemei, irregular, uneven rhyth<i.i, of the lampling error, and that the 

and lines that Continue ffom one to t'lc testa were grouped in the twoKtimen- 

other without clear breaks. I do riot aonal apace of this two-factor pattrrn 

want to waste time by discusmg the in such a way that four tcst»— the bod)^ 

second factor also; one factor wiU sway test originated by HuO, the 

iUustrate my point sufldenUy. Starting Chevreul Pendulum test, and two arm 

from a position in which wt have no kvitation twi^-con*|ituted one groip, 

guiding principles as to how w« should whaetntsofthaBinettypt-i.ro. 

dawfy our mateiH we emerfi with ptognm^ weights, etc. 

ade«r.cuthypotlm»<U«»niiM -instituted the other group. These 

sentiaUy by the mtemid orguiMtiDa ^„,^orrelated. 

of the prtft^renw judgiiitiit..Thkhy- ^ separatjtn between the 

twnal development; we have lasted . .JIu tT. k. 

it by predictirtt preference for Doetns 9® • The original hy- 

no7conLed «^ original s^k^ . ^ <• conclusively disproved and 

MidwhavedevdoHHfumSy *• )»n>othe«s suggested thit we are 

by showing that smiaar prindplet of "P*™" *n« of 

orgmntution obtain in prafercnces for •"tl«<»Wity which we caUed "pri- 

pictures. jokes, statues, and other aes- mary and 'secondary," or "idiu- 

thetit object^ and by showing that ''nnmy** suggestibility, 

this smplicity-complacityhKtork cor- When this new hypothesis was tested, 

routed with temperament (3). Many ^ different popuUtiom^ and 

<nher examples could be but 1 ^W^no"^ tests, such as a oMasure for 

think this one n sufRcknt to iUlustrate l«ypnotiiability and a variety of tem 

our point that factor analyM may give ^ ^ eensory kind, the deductions 

rise to claiaificjitory hypotheses. >*tre confirmed in each instance ' 

At an example t> illu:Tate our (S)- Here then wt have an example 

second claim, namely that Uitor anal- how factor analysis can be used to 

ysis can be used to test a class* ffcatory diiprove a hypothevi. namely that of 

hypothesis, I may perhaps <|Uote our • general factor of suggesa'bQity, how 

studies in soggettibility (4). The hy< it can suggest instead anothe, hypotii. 

pothesis wa«iKr)up that eight well and how it can be used »o test 

known tests of Mggestibtlity measurf d this new h /othesis. 

147) 



I 

I 



f 



1940 INVITATIONAL CONFERENCE 



V 




A third p<miblr use of factor aiul- 
ym m^y lie in a field in ^which it has 
no\\ hitherto been uted to tny lignifi- 
cant extenti namely that of the de* 
scription of lociat groupt. It k cua- 
tomary to deacnhc individuab and 
groupa in trrma of tcortt on pijrcho- 
metric tem; thus a group of demo* 
crats may be more *^radkal^* than a 
group of republkani in tenm of lome 
mcaiure of radicaliinKonienratiam* 
However^ it m ptwbte that dtfferencca 
between groupa may be apparent more 
in the organiiation of component attn 
tudea than in overfall acorea* , Two 
groupa may not differ wkh respect to 
^^radicaliam** meaaurad, but they 
may ahow differences wkh regard to 
the pattern of tntercorrelations be- 
tween the component ettkudea. Factor 
analysia appear* to be preferred 
method for disdoctng and <piantifytng 
auch differences tn organiutinni and 
it haa been uaed in thia wajr in our 
studies into the organiiation of social 
attitudes aa dcterauned by politkal 
party, by age, sexi eWation» and by 
nationality. \^ 

If these are the usef of fiktor ivnaly" 
sis, wh..t are its timttationsf One 
seriour limitation lies in the *Jick of 
itatmictt criteria of significance kr 
factor loadings, for TarianceSt and for 
residuala. While we ham approxtma* 
tions, and at least one method, namely 
that of Lawley, whkh permits of the 
application of such criteria, nfterthe* 
Ins the absence of practicable and ac- 
curate methods :or estimating sig- 
nificance is a setMHis business. Another 
limiutiun is implied in the outline of 



the use of factor analysis gtvrrn a^4>Ye— 
the evidence given by factorial methods 
is often suggestive rather than defin* 
itive, permissive ratl^r than con- 
clusive. HcWever, factor analysis aharea 
this Umiution wiJi almost all other 
scientific rtsaarch mechoda. 

While the^ limitations are admitted, 
others, also often suggested by critica, 
are not. 7)ie fact that factor analysts 
do not always agree, for insttuice, is 
no mo»e a critidsm of factor analyw, 
ar^^S does not set up any more naoesaary 
limitttions, than doea the fact that tht 
respective schoob started by Weirr- 
straai and KroneciBer in mathematica 
hold diametrically opposed viiws on 
the naturs of auch a fundamental con- 
cept aa numbersi limk the uaafuloaaa 
of matheniatica. The fact tnat factor 
^alysia makea certain asaumptbns 
gardinf linearitjr and tha additive na* 
ture of its tariablH does not oonatitute 
a neceasvf limitatioi aa theaa a^ 
atmptiona can be tested, and aa msth* 
cia of factor analysis not dependent 
on them can be envisaged* The fact 
that factorial analysea oflan give re- 
sults which ffmfiiiidy abeurd consti- 
tutes a Itmitetion of factorial analysis 
only in the sense that thia sutistkal 
method does not guarantee auccess, 
when inappropriately used and inex- 
pertly handled, any more than does 
calculus or any other mathematical 
technique. Fact< r analysis requires f iat 
as insightful statement of the petrfilem, 
just as careful design of the experi- 
ment, and just as skillful interpretation 
psrchologicaUy of the results, u doet 
any other technifuei if used in any 



(4»] 



ERLC ^ 



'1 



TESTING PROBI^EMS 



other wnj if will provt midcsdinf and 
ujiMpful^ Its, limiMiomi btofnr u 
they arc not oicrtljr of a temporary 
technical nature, are deflned hf hi 
pHr po>e » ■ " whfle uieful and imlaed 
lencial for canain ptvpoaM^ k tkrovn 
no light on other types of praUemi 
and does m>t attempt to dapboe other 
methods more adequate for their lolu* 
tion« In other words» like aD scientific 
methods, its usefulness is not untmsal, 
but citcumscnbed, and only the ma- 
turi judgment of the expert can de- 
cide whether in a ginn situation k m 
likely to pit him the answer he wants. 



ftlPIEXIiCBS 

(i) lysmck; H. J. IMniiaiiasi of ^r- 

•puslity. NT.: MaunUlan, if4f* 
(s) •-^r-— w teaw rseiefi in the Apprt- 
cMea of N«tf7, sod Their Rtbtion 
10 TempsfMMisl QesUtm Ckm. 
md fMooL, if40» VeL Pp^ i«o- 

(,) jIL^, The liperiimtsl Study of 
tife •«oed Oe^slt*^ A New Ap- 

Pp. }44-S<4« 

(4) ' >> tMMS<iyWty sod HfWtrla. 
i. MmmtL AyeMein if41i ^oL <• 
Pp. 

(5) ; ^ u4 Psmsaaa, W. P. Pri- 

SMiy sad Iseoadafy SuggertMUty: 
Aa tapntiiitsl sad ftstMoal 
ptody. J. iMp. PtycM.^ if4j» Vol. 

Ij. Pp. 4tj-J0}. 



Uses and Limitarions of Factor Analysis 
in Psychological Research 



PAUL HORST 



It ieemi to me that acientific inVeiti- for which the prices of each |of the 

gations in any diacipline muit be con- itocb are given. Or the entidei mighc 

cerned primarily with the vkr>b!*i be Tarious geographical rtgrnna and 

which are thought to be fundamental the attributes might be Twwbkt ludi 

to the science or discipline. If there is as wind velocity and direction, rela- 

general agreemmt on what the funda- tive humidity, bMwnctrfc prwurt, and 

mental variables of the sdancc are, other atmoipiicric variablea. , 

tnen it is difficult to set How factor Factoi* analyas aaumai that there 

analysis of any sort could be of value, existt a relatively aaiall number of 

Howevet, if the investigators within attributes tm the baab of whfch the 

a discipline cannot generaUy agree on entities may be diftcrcntiatid from one 

wnat these variables art, then I believe another about m ad^tcly as on the 

factor analysis can pUy a useful role basil of the largk number of expcri- 

in providing an objective basis for ential attribuist. Factor analysb as- 

agreement Certainly in psychology sumcs that the numerical values of the 

there is a marked bck of agreement experiential attributet can bc espreimd 

with reference to those variables im- as functjoos of the primary variables, 

portant for describing, predicting, and The traditional ^d current methods 

controlling human behavior. of analysis have asnmiad these func- 

But in what mm may we regard tions to be linear but these assumptions 

any set of va:jablfs as bask for the are not nccenaty except for practical 

Kience? In general, the factor prob- convenience, 

em arises whenewr in a given disci* EssentiaUy, then, the purpose of fac- 

pline there exists, first, a large num- tor analysis is to simplify in a very ipe- 

bcr of ena'tiet belonging to a specified dfic manner our description of obierv- 

clws, and second, a large number of able phenomena. This smplification is 

experientially ditttnct attributes on the' achieved by reducing the numUr of 

; basis of which the entities are differ- attributes we require to difTeientiate 
entiatcd from one another. In particu- ' one entity from another. Presumably, 

lar, the entities may be people and we may regard the primary attributes 

the attributes may be tests. Again, the as a sub^up of the larger group of 

entities may be corporate itocb and experiential attributes, if we include 

the attributes may be successive days not only thoai whkh have already been 

CSOJ 



i 



TESTING PROBLEMS 

nuimhcaUy tvaliMlMl, but alio Uum PrNuowbtjr, il wc had tU el thi 
whirk might ctmcdvahly bt lo afalu- tntitfca within a v««m" 
and bi tht luturt. It am ha <U»oo- wipaet to aU ol tha aiparla«tial attn- 
ilratad,h»ircftr,thatifaii«ralatival)r buta^ tha principal adi mathod wouM 
MuaU group of attribtttai adati auch anabia ua to datannina tha aaininal 
that aU tha oihart may ba aaprnd numbar of attrftuiM raquirad to aiti- 
aa lunctiona of thii group, than thara mata aU of tha othar aapariantial attri- 
will ako ijcat an kAnkm numbar of butat. Tha mathod, howaw, would 
Mich grotyi, afan thou^ thajr may not taU ua which of tham might ba 
not air ba eapartaaataUy indapandant moit apprapriattdy lagngaiBd into thia 

Tha quMthm tharafoca^aritoa M to minimal Mbgroup. 
whkh of tha many poiAk onaU And thii ii whara tha Mcond cri- 
groupaofattribMMonariMuUttUctaa tarkm iinaadad. Many of you han. 
a baM for dttinwting tha larga numbar alraady, doubtla« antidpatod a more 
of txpcriantial attributti. Again, wa commonly known n^ntdon far tha ^ 
thaU adopt tha criiarion of limplldty meond numarical critarkm of rimplic- 
of dcacrftion and daiina it in numarn ity which I hata ju« diwiwd, I ralw, 
cal tarma. Lat ui amma that wa hava of eoum, to tha concapt of limpla 
appnodmaialy 30 attribuiaa in aach itructure. I hava praferrad, howavar, 
•ubgro«9, and that aU of tha 30 ara to fbffmulata tha concapt of wnpla 
ntcdad to aatteatt all of, my, 10,000 Mructura mmawhat diffarantly from 
e>periantial attributoa; within tha hm- tha traditioiyd ona bacauaa I think tha 
its of accuracy wa impo« upon our^ concapt formulatad in thii way ii Im 
•elvca. AMimM, howavar, that for ona contronnial. 
of them groupa of 30 wa naad an But tha crucial quattkm ii, "WQl 
avtrai^e of only 15 of tha attribute* tha idanti6cation of primary variablaa 
to predict with acceptable accuracy enable urto make mora accurau pre- 
each of the cxperiantfal Tariablaa. For dictions than would otharwim ba poa- 
anothar group we need an average of libler' Let ui im how h might In a 
ao of the primary variabka to eitimate two- or thret^imcniiooal flntom it 
each of the 10,000 experiential *ari- k not dilRcult to ihow that tha primary 
ablet. We find, howew, finally that variaMei in the tfmm ar« thoi*' from 
for a particular group of fundamental which all the other variables may be 
variabtei we need only an average of etttmated without the utt of negsdf e 
7 of the 30 to ettimait each of the weights or coeflkientt. If one ar more 
10,000 experiential ▼ariables. For no of thf primary variables is replaced by 
other group is the average number so a jionprimary v»riablc in the predic- 
low M 7. F'rom the point of view of tioo battery, then the nonprimary pre- 
•mplicity of descriptionv therefore, we dictor* will cause the primary predic- 
take the group which rc<|uircs an aver- tors to uke on negative weights when 
a^e of 7 and designate it as group ceiialn of the other variables are esti- 
of bam: or primary attributes. mated. It is. quite probable that this 



1949 INVITATIONAL CONFERENCE 




principle will extend to t lyttem of 
any nuipber of dimensionfti and tf)at» 
therefore, \( other than primary vari^* 
ible»are included in the predictive ict, 
negative weights may appear in the 
estimation of certain of the other vari« 
Abies, In fact, primary variables might 
uscfMlly be defined as that minimal set 
o( variables from which all o;thers 
may be ettimated with nohnegative 
weights. 

The implications of primary vari- 
ables for accuracy of prediction sbouli!. 
' now be more apparent. If it were poa- 
sible ^ways to fincT a set of primary 
test variables for predicting succes^ in 
school or vocations or elsewheK||^|lm, 
presumably^ we should^ never have 
negative regression weights. For cer- 
tain special cases it is easy to show that 
predicted scores involving negative re* 
gresaion weights are less reliable than 
predicted scores all of whose regressior 
weights are positive. It should prob- 
ably not be difficult, therefore, to set 
up rather general conditions under 
which estimates made from primary 
variables would be mort reliable than 
those made from nonprimary vari- 
jbles. Othrr things being equal, then, 
the use of primary variables in predic- 
tive batteries should enable us to nuike 
more reliable predictions. 

Is there any other way in which 
factor analysis might enable us tt) make 
more accurate predictions? It is easy 
to show that a factor analysis dot) not 
yield more information than is con- 
tained in a matrix of measures on 
which t is based. If the factor analysis 
is carried out so as to include all the 
information given in the correlation 



matrix, avcn errors of measurement^ 
then mult^e factor techniques yield 
results identical with, and ktAcc no 
better than, the multiple regression 
methods. Most ftctor^ analysis pro- 
cedures, however, assume errors of 
measurement The 'problem is to esti- 
mate the original measures in terms of 
a much smaller set wt^ sufficient acr 
curacy so that the remaming variance 
may be considered as due lo chance. 
If we assume that errors of ntasurc- 
ment result iit errors in regre«fam co» 
efficients, then one of the sources oif 
error in applying regrtssion weights 
to a new lample could be eliminaied 
if errors of measure OMnt, in tihe origi- 
nal sample, were excluded. The i^ac- 
tor techniques may enaUe ut, for a 
given sample, lo a more accurate 
estimate of the trw. correlations for 
that sample than is given by the ex- 
perimental correlations. Therefore, it 
is cartteivable that by means of the 
factor technkjues w% could, for any 
given sample, obtain a more accurate 
estimate of the true regresMn co- 
efficients lor that particular sample. 
The true regresston weights for a 
given sample should, in genera], be 
closer to tUt true population regrsision 
weights than would the k*egression 
weights incorporating errors of pieas^* 
urement in the sample. This suggests 
that regression weights obtained by 
factor technkpies on a particukr sam- 
ple ^might yield more accurate predic- 
tions on subsequent samples. There- 
fore, the factor technk|i*es should re« 
suit in mort accurate prcdk)tk>n of all 
phenomena, human^ subhuman, and 
physical, whert the fundamental vari- 



(52] 



ERiC 



i7 



-. / 




TESTING PROBLEM^ v 

aUes hive not been dearljr iioUted computing machine*. But if yotf^n t 
and agreed upon. tmAll burincniiMUi type of 'rmtrdiff 
Let ui now coniKler lohic of the you cannot afford tha equipcntnt ft* 
limttationt of (actor analym for pqr- quired to calculate ijao comlation 
cholggical reeearch. One of the a nc' coefBdenti on 500 ptopil* But awuA- 
srrioua bmiutions comet from the i ing you can get the data and the inter- 
that factor ^anaiyw hu a torado* correlational untold man houra of labor 
appetite for dau. If you want to come mV iie ahead before the limple itnic* 
^ovi with retulta of any con sequence ture matrix is obtUned. , ^ 
you ahould halife 50 or 60 vaf^aUea or Before ttartlng the factor analywi 
tens on at least $00 cases. Comparable youll have to deddt what to do about 
, form reliabiltty should alwajrs be in^ the diagonal elements. Here you v^itt 
corporated in the design of a definitive run * head«on into another rather 
factor analyaia. Assumin^ that a single eerious limitation of Current Uetor 
form of a test should be at least 10 techniques^ To date there is no dear 
minutes long, the two halves would agreement as to what should be used 
take 20 minutea. This peana that you as the diagonal elementa. Should you 
could test only three variables ah hour use unity? Should ypu use rrliabilty 
so that it would take ao hours to test coefficferits? Should you use estimates 
all 60 variables* If you multiply this of the corhmunalities? If so» what art 
by 500 people, you have iU|00o man communalities? The communality of 
hours of testing time just to get your a test is a very oh etr ep er oua sort of 
basic data. I would not be mdined to thing that may jump around unpre* 
uke very mtously the results of any dictably from^ one ttet battery to an-* 
facior analysis involving psychological other. > 

tests, which falls far short of 1 0^000 But even assuming that the com* 

man hours of testing time. According munality is a fixed value for a given 

to this criterion, very few factor stud- test battery, how will you define it? 

ies to date can qualify as thoroughly Can you say that the communality 

respecuble. Obviously, foctor analysis shall be such that the rank of the 

is not for the fone wolf operator. It matrix is a minimum? Hanlly that, 

is too difficult to pick up 10,000 man' because* mathematicaUy the rank of 

hours of testing time. The collection a matrix with unknown diagonals is 

of data for factor analysis projects, I determined solely by the number of 

think, will mnr^ and. more have to be variables in the matrix. If you define 

sponsored by large scale ccioperative the communalities in this way, tome 

research enterprises. of them could conceivaUy be greater 

Another limiution rf factor analy- than unity and some of them might 

sis is the time requirei^ for the actual he negative. You may therefore insist 

computations. The com^^uution of the that in addition tp determining the 

table of intercorrelations can lie car* value of the communalities solely on 

ricd out fairly rapidly ^ith modern the basis of the numbed of variables, 



ERLC 



Hi 



\ 



I 



1949 INVITATIONAL CONFERENCE 



n > contmunality be leu th«n tero or finding values which, will confora to 

greater than unity. Uut awgnin^^up- it might be very difflcult. 

per and lower bounds for th*^ com> What, now, are wme of the more 

7-\ muinlitiet ii a long cry from aaugning common argumenti for the uai of 

*^ upecific valuci to them. eidmatet of communality in the di- 

Awiming, however, that you could agonakP In the fint place, it lb««rgued 

lolve for, values of the communalhics that the table ,'91 intereorrelatfoM can 

which were within the acceptable be more u>mpl*taly accountad for - 

bounds and which enabled you to with a sniller number of foe ton. 

account completely for the intercor* This staten^nt ii cenftinly true, but' 

relations with fninimum rank,^ou it expresses a purely mathematical arti> 

have violated the basic definition^ of fact By the same fo(en, tha uie of 

communality. The correlation coelR- commufudities in the diagonal leaves 

c!en6 include errors of OKaeurement a greater proportion of the total test 

and if you completely account for variance unaccounted for. Actually, 

them, then the commUnattbes must what we have when wt uae commu- 

also have -rror variance in them. You nality in the diagotials h a nort^ 

might avoid th» inconsistency rather than a less c o m plicated system 

suming that the rank t of the maM( for describing experiential phenon- 

is smaller than the rank t detcrminedv ena. We atart out with n attributsa 
by unknowh^^iagonals. What values ^^d wind up with »— i attributes 

now must you put in the diagonals ^herc « hT the number of variables 



so that if you apply a princ^al axis 
solution carried through i compohentar 
the sum.of the Squares of the residual 
r's w3l be a minimum. Yoti might in 
turn let I take all values from i 
through I and determine t seta of di 




i is the number of focton. 
!t has, of count, been pointed out 
It what is spedfldty for a test in 
bottvy may be communality in 
lother battery. The arguoient gees 
It if we analyxe enou^ tests in 



agonal values, one for each assumad ^enough different test batteries, evtntu- 

rank, ipch that in each case the sus ally roost of the specificities would be 

of th<e squares of the residuals «iq^ absorbed by communalitiss. The ae> 

be a minimum for each rank. JSjfttixn- sumption seems to be that the use of 

ing you had appropriate method for communalities jn the dhgonals in a 

determining the smalletl rank for number of small overlapping battery 

which the sum of the squares of the analyses wil' eventually result in the 

residuals was due » chance, you same factor loadings that would be 

might uke these corresponding valuet obtained if all conceivable tests were 

ai the best estimates of the commu- thrown into one gigantic battery and 

nalities. This teems like a pretty re- factored with unity in the diagonal, 

t^ectable operational definition of a So far, however, the validity of the 

communaUiy, but how to go about assumption has not been demonstrated. 



CM] 



^1 



V 



TESTING PROBLEMS 




ERIC 



It seemi to me ^ut tht logical way 
10 analyse this col<«l matrix would 
be to put unky in tm diagii^ali and 
fervently pray that'Uit number of 
factors re^iind to account for 90% 
of the total tynematic Tariaaet would 
be a very small fraction of dD the 
variables in the matrix. If our prayers 
were not answered, thtn we should 
have no sdentificaDy hpnoraUe re** 
coi!r;c but to commirhari kaH» be- 
cause the bask assumptions of identific 
methodology would ^ve been demlon- 
strated as untenable. If, however^ we 
dedde to analyae this giant matrix 
by estimating the communalhy resid- 
uals tt each stage, I su^iect that the 
oflf-^iagonal residuals would approach 
chance values when die cofmnon fac- 
tor variance wps sdll far short of 
90% of the total systematic battery 
variance. It we found that moat of 
the^tests still had spedfk variance left, 
I would lay the results nolio the in* 
^herrnt chaos of nature, but tto the use 
of communalities in the diagonals. 

Attde from the fact that' commu- 
nalities are said to reeult in a smaller 
number of factors for any given study, 
a more plausible justification for their 
use is also presented. It has been 
allef^fd that ttmple structure is easier 
to attain when communalities are used 
in the diagonal rather than unity. But 
to date no quanfitativc or objective 
criteria for simple structure have been 
advsnced. Therefore, the question of 
whtfji of two *ets of rotations on the 
same^ata comes closer to simple struc- 
ture cannot be answered at the present 
time, except to the satisfaction of a 
particular experimenter. Therefore, 

So 



since no techniques are^ availahk' for 
testing the claim that the uae of coc»* 
munalitias in the diagonals enables 
one to obtain less amMguous simple 
structures, one can neither affirm nor 
deny the claim. 

Actually, it saems to ma that fir 
too much emphasil has been pbced' 
on communalhy and not'enoi^ on 
^wdfidty. From the point of view of 
prediction, |Iie iiasfukiess of a gsoup 
of variaUti varies inversely as the 
commoa factor variance in the battery. 
Ideally, we should* han a. battery of 
measures witH very low or near wro 
intercorrelatbns so that there vrould be 
no common factor variano^ and oiost 
systematic variance would be spedfif. 

There are, then, at least foul' unan- 
swered questions concerning the com- 
munality. First, can you ^ define it; 
second, can you cakulass it; third, 
' should you uae it at aO; and fourth, 
how should you use it? 

' I have suggested tKat I regard the 
concept of simfde structurs as a basic 
scientific contrRiution. For this reason, 
I think that the techniques for achiev- 
tn| simple structure are of great hn- 
portance and that any defects in. these 
techniques are sagous limitations to 
the uses of factor analysis in psycho- 
logical or other scientific research. One 
of the questions which ariies in con- 
nection with routkm to simple stf^K- 
turs is whether orthogonal or oblique 
rotation ihould be employed. We know 
that matrices of interconrclatfams can 
be r^de to vary greatly by systematic 
selection with reference to all or some 
of the variables. It therefore seems 
pUumMc that intercorrtlitioni of ttther 

S51 • ' 



!949 INVITATIONAL CONFERENCE 



cKperientul or primary variable may 
weil bt regarded M a function of the 
particular sample on which the mea»- 
urei are drawn. Therefore, it would 
Mcm plauiible that a rotation to lioi* 
pie' structure on one group might 
be nearly orthogonal, while op a 
^xcially lelKtcd group it might be 
deariy oblique. If, then, we regard the 
apecificcharactcr of t^e tranifbrmation 
aa being a function of the j>articular 
sample, would be unrealittic to ^x-i 
pev-t that, for all groups, the trtuu- 
fomution for any particular battery 
should be orthogonal. Therefore, it 
seems to me that the qoestkm of 
whether, to use obltqua or orthogonal 
tranf'ormations is no longer a critical 
iisue. 

A much more serious problem arises 
in 'connection with the actual tech- 
niques of rotation and the criteria for 
s'mplc structure. At the present 6im, 
no completely objective ' and unique 
method Js availahle for th« rotation 
operations. One of the most pressing 



needs for rtsaarch irt factor analyliii 
and I believe we might ii^ for psy- 
chology in fmeral, ii ranarch in more 
adequat* methods for iIm tranafonwi- 
tion of arbitrary factor matrices. Most 
of the factor tachaiques wtf indiqrte 
the minanum nuabtr of attrftutss 
required, but only tht loiMiaiial*p«D> 
cedures wiU identify that un^qiw set 
which is 'maxnully parsfmoaious far 
the' descnption.ol most qgniicaat 
• experiential attributes in 'the sysiaio. 
Therefore, I believe that the linpla 
structure concejk, or soineti|dng equhr- 
•lent to it, constitutes the greatest 
prt»mise of facti^ afiplysis to'piycho- 
logical research while at the same tims 
currently availahle proccduies for at- 
taining sinpU structure represent the 
most serious limitations. 

In summary, then, the factor tech- 
niques should result in more eco- 
nomical and accutats predictions of 
socially sjgnillcant behavior provided 
the administirativs, coraputationd, and 
technical limitatioris can be o ercome. 




"DISCUSSION 



rARTICIFANTi: 
PhILUF RutOH, WltUAM SnrHlKiDK. 




Dr. Rulon: Mr, Chairmajiy if 
chii group nifon't iccuk mt of tnro* 
gancc, I ihould Itkt tp iMggttt that 
the fim two lyeikeri are both rjglit 
I know tkm will diuppoint Mr« Buroti 
who Uk«i to havt trgumnt JiHt to 
flhow that I am not mHy arrogaiit* 
I will not antfflpt to provt thai all 
three ipaakOT ar« r%ht. 

At the wad uaie School of Educa- 
tion at Harvard» we think we haire 
juit solved the hindaiAenial pfohkn 
of guidance^ that ii| we have extended 
the Fhher diKriminaat functioo con- 
cfpt to fouf" diMMidm» we think. 
That meane— wellt Itt toe give you 
the example we think we hive worked 
our»*uiing Boot data provided bjr 
Df » Henrjr Djrer of the Am College 
of Harvard. 

We have given nine'teeia to live 
groupi of people^ and then, extending 
4iie Fiiher diacrimintnt function con* 
cept to fi iir dimcn»ont--let't lee) in 
the ordinary diKriminant function, 
you have two groups in one dime leion, 
you would have three groups in two 
dimcniioni on a plane^ and Ave groups 
ui four dimensions. We think we have 
computed four sets of discriminant 
ioefBcients m> that, if a youngster 
comei along and we give him thess 
nine te^ts ><^c csn compute the co- 



ordinates for thsi youngster and assign 
him^a point in space. We can then 
compute thi cr oea dimensional dis» 
taoce, or the sbot distance, er the 
diagmal distance in four dimensions 
fbi; this individual irom each of the 
five groupsi And; of course, the short* 
est dMiance iall| whieh groyp.he be- 
longs taH « ui (he ordinafy di>» 
cruainant fuacdon the shorM 
along the Kne, either plus or minus, 
telb vrhich kii^I heKsve it was whrJi 
tribe or spedea the.specmien belonged 
to in the original Fvher dteriminant 
fi»nction. ' 

It also means, therefore, that, if 
this efsiem works, it will be poasibk 
to give one hundred tests so twenty 
profesiional g ro up s d uctois , lawyiMt 
you knofw the lise~«nd when there 
comes along a subject .dr vocational 
guidance, we compute his coelBcients, 
-afugn him a point tn space, and list 
the slant distanta of hii potni from 
the ceniroids of the groups; the 
shorter*distance, of couhe, determining 
the group he looks like. 

It looikn as though the System is 
fatlnnfe in thc^same sense that the 
initial discrimmant function is (aA* 
safei that is, if we are not*m<king^any 
discriminatior, these slint distances 
win all come out the same, and the 



5A 



\ 

1949 INVITATIONAL CONFERENCE 



%yvtm thercfur* •pparantly hn the 
virtue that mult^ regre^nun haa — 
.tiiat, if }ou «re not dotng My predict^ 
iiig, iheqrMemMyftSQw 

Now, obviously, what Mr, Horn 
uyt about admiiMtrativt diflkuitkt m 
cikuUtioa are goiaf w piagiM us. 
A trrmcmbui amouiit of calculation 
is nccewary. It took ut appmdnutely 
three months to fet four columat of 
nine cocfidents each in tibt four-<li> 
mcnsional case for five greuf« in only 
nine tests, and cvcrjrbodf knows that 
this work goes up appropmaieljr bf 
the cube — I think that is c or r s ct — of 
the number of cntriea. But the reason^ 
why I would lik* to suggest that" 
both of the flrst two speakan ars right 
is that tho approach is strictly prac- 
tical, foUowing Dr. Baanctt Tb« va- 
lidities and distanen art in tinm of 
the obaerved tsit scores tiMlasalvts; 
X that is, wc coOtct the scorva from the 
individual, and ikam art what get 
the discriminant we%ht8. 

But as soon as you decide to apply 
this thmg practically-HUid that we 
have no workshttts for^-fou sac that 
you will have lo minimiM the number 
of tests you give ; that, if you ar^ going 
to cover the distinctions between pro- 
fessional gro^ops, say, with any feasftle 
battery, you will have to use a battery 
which covers them mamnally urith 
the smallcat number of tcttt, and so 
you have the thcortdcal problem of 
what docs distinguish these people, and 
you will have to apply the facsor- 
analyvs method,' I should say, to de- 
rive the tests that y.^u will use in order 
to get maximal coverage. But the 
rraJ problem is juit the lamc, that the 



man comes slong and wants to know 
to which groups he bclmi|i, and I 
think he will want to b^ow «c^l lift- 
like gro up a duciui s , lawyers, Mini- 
art, ditch d%|trs, or wImi mi— 
•nd tha couostlor, or iha gnsdanfi 
oOccr, wfll not bt very mnek iMsr- 
estad in abacractloas like factor load- 
ings. 

I think that our tystOB is aound, 
but wt don't know yet I havt ssen 
the four columna of nba cotfldsnti, 
but thert is ah awful lot of checking 
to do. One material proMam is what 
is the unit of racaaorcmcnt in tha croM- 
.dimensioital distanctf-^-bacausa 
diNancas aitt danomkiatt. I aaua do- 
nominais aumben, ao tha unit of 
nwaaurtmant fanportant. I would 
likt to suggtai'Ikaf wt are going to 
need (even if the qwcm w or fca and 
we have our Ingen cr osss d , because 
if the system works, we have prac - 
tically what amounts to An **A** boonh ' 
in the guidance movement) a trs- 
mandoua unount of machinery to 

I might say aa an aside that yon wfll 
notice, if you do give a hundred mm 
to twenty professional grpupSi, and 
then you citU in tht IBM selective se- 
quence computer to get these weights, 
you. have to do that only once. The 
guidance counsslor, when his tiabject 
comet to him, takes the scorta and 
applisa the we%hta, but with an ordi- 
nary desk computer or with aa or- 
dinary IBM installation. The soluiioa 
of the basic problem, in numerical 
terms, requires the use of the IBM 
selective sequence computer only once. 



[5S) 



5' 3 



TESTING PROBLEMS 



i«ir tl^min battery. However, pleMt 
noike thai jrou vouUmmc aflTprd to 
iky the Ijoo-an-hoii'r fee for the 
IBM (xNBputer up ia the World 
Htaii^uanen buOdtog ttnli»yott were 
pretty mre. that you h»n |ood cover- 
age on the diMsnctiom between theee 
profrtnom, And l Mggctt, in 
to be .Mie you hate that |M«tty good 
iovcrage, you had better uae the ana- 
lytical froceduie wgfniid bf Dr. 
Eyeenck. But to me, the anawen we 
m looUng we ere loolDng for in 
terint ipedlied bjf Dr. Bennett. 

Dft. SnrHBNiov: tA*f I m^ett 
thatk uuiead of analyiiag these thing*, 
«e compound aomrthing to we don*t 
have to handk more than one or two 
•corc« in the end^ ^ 

You lee, the miMakcn outlook by 



our f^nd, George, it that when you 
factorize and discover focton G« V, |C 
and ao forth, you ihovk meaauie 
theie aeparatety. The truth ii tfuK there 
might be a of which taMlvci 
all' of theee f»cm% and that tM ii 
the one you ihould wane ti» tne. It 
tii 'in feet, the one that he actually 
u»ed (0 tun with, and that ii the 
nn-- we m^ght well cohtfnn '» uae. 
rhere ii no certai ; myadf, 
who thinks that we tbu U cry to 
joeasure the little V /md^makt any, 
particular determination fram Jiav 
However^ th/f point is that perhaps wi 
should begin to turn ounelvw upside 
down and, instead of analyi^g,. let 
us pile the siatirtiirs together. Perhaps 
we won*t need all these awful ma- 
chines. 



• 



PANEL III 
Information which should be pro- 
vided by Test Publishers and Teist- 
ing Agencies on the \ alidity and 

Use of Their Tests 



Information which should be provided by 
f est Publishers and Testing Agencies on 
the Validity and Use of Their Tests 



HKRBERT CONRAD 



ERIC 



APTITUDE AND INTELLiOENCE TESl^ 

As AiL fiv YOU KHOW, the inform*- or very little is laid; or only very 

t m which te« pubbhrn provide in vagve ■utemenu are given regmrdin^j 

c'jnneciion with their tests generally t)ie nature of the sample, the rtlishility 

leaves something to be desired. Some- of part scores, theN^sctors measured 

iimen we find a ^rst manual coniitttng by the test, the nature of the criterion 

r»f a poge^ perhape, and no more. g«>«pi> th« nature of the criterion, 

Somrtimes we find a statement to the correlations of subtests with the cr - 

elTrct that the validity of the test is <erion, and so on. 

wsured by the care exercised in its ***** » ^ we might jusd- 

construction ("sttndard techniques'* wk juit whit ought the test 

havingbrenemp]oyed),or, again, we P^blithers to provide. AU of us, I 

may find merely some bland rea»ur- P**«T good ideu of what 

ance that the best evidence of the * ""f*^*** but perhap* a reorgtniiation 

vilKlity erf the test will be found in ir^ restatement will be in order, md 

•uccmful use, at the conclusion, I shall give my 

You arr alj famJur with the ex- C«» th.ng* 

cewve claims which, if not made, are publishers, 

iiiwnuaied, In the case uf one well- " «o me that thr test manusl. 

known te«, for example, a footnote ZJ^' 'nfo'"«tion given by test pub- 

ftpk.ni .hit th. vaWuy «,d relmba.ty ^t*'** ! 

M*m..n,s prevntfd /rf.r l« the tr« ...f/t^rt" ^"."^ '''' 

a. ,04.. when w« a k^khI dr.l ^Uf"^^ '':'^'^! "'^'^ ^ 

I J ..j ,,,, <i4tmrd for he tr«^ That ought kk 

l..»K- ar,d un.bb;,v...d. Us. n h. spelld out in r«.«der.ble W 

«"-«iVHu«., ,..u..r.. U .1 If.f.rr„mplf..hrWrchUrr.Bdlrvur 

|4a a^unr>f thai th<* ri^v/ ^^fhf^^ ^. ^ i i 

• T1»^'hH>^ * ^tl < ^„s , _ I ^ / 

HI,: .i^Mi^ vvh,i y.sn ),,,,.^, ^^^^^^^^ ^j^^^ ^^^^^^j^j ^j^^^j^ 



■•iiWIIiii^iiiiTOWi'i^iikiiiyii^ 



4 




INVITATIONAL CONFERENCE 

Sfconiiy w€ sliould know the aji- Ftnt of ail, how miych of tht tiik 

^wtt to the question^ What crnerii score repreientt nothing more ttuin 

were employed in v«iid«tmg the test? chance, that m to vifi how .nuch of 

And do theee crittm match the^pur- the teal ii» to tpeak, nor .-aif That 

poset which are announced for the quetlion appliee not ^ .o the ten 

trit^ The critena may be imoMNliate haelf but alao to pai v >ret or wb- 

criteria^ fqf example^ «n algebra ipcv- tcm. 

tude tett n given by a teacher who Secondly, we ihcald like to know 

wants to know what the achievement tomething of the mtrrnal conaiMnqr 

Korrtorgradeiwfll be of the students or functional unity if each of the 

in dgebra that very next aemetter, lub-tests. That is usually obtained l>y 

I he critena may he more or less item analyaia* Relativfly few tests re- 

intermediate! such as in the case of an port dau on item analysis in any de* 

elementary tett being used to predict tail 

the kvid of course in high school that Third, we shouM like to know to 

mi|cht advisedly be taken by a student what extent the test measures a mix- 

— such as classical, college preparatory, turj? of speed and power* At thi 

commercial, general, and so on. Or present time, so far as I know, there 

the criterion may be remote: and ulti* is no very unifomi method by which 

mate, in, the sense that: it ineasures the speed component is measured* 

performance on the job as ah adult, Some people use the number of cases 

Educstons I think, are interested in or proportion of cases attempting the 

sli thr^e of those kinds of criteria, last ium. Dn Tucker of the Educa- 

and very few tests give information tional Testing Service uses, among 

about even one« other things, the ratio of the standard 

Once the criteria are tnnounced« ^ deviation of the number of items an- 

wc would have to examine into their swered correctly to the standard devi-^ 

validity. For iiiiunce^ in the case of ation of the number of unammpted 

(he algebra (eacher, we would want items. Obviously, if the standard devi* 

to know how valid are the grades of ation of the unattempted items is 

that <rachfr. In the case qf the high very great, then the speed factor is, 

KhM coursfi we would similarly presumably, fairly important in de^ 

want to inqtiiie into the vsli^lity of the tr rmining individual difTerences in the 

iritrr»a therr. and, nf course, mmt test scorn. 

♦4 4ll» w^ wv lid w^ut u^ ioi\mf info Fourth, wr •ihmjld like t« know how 

the vsIkImv uf thr Uiu} ut u\imm< tlui,paf tir uhr trM (oinparcs with 'ithrr 

l>frftirmaiMf nutcria How n it rdaird to other lest^^ 

S<i far^ ^^^t h.<v not nteati<H»rd a^v* Hrrf, what wr n^rd is a table of in- 

\h\n^ a^K»uf «hf <rvr r (rpt thru pur- irrtorrclationi - whrth srMom jrac^^ 

IIM»w;< Wr hav( talked afMUfi tfjtrrM 4 ?rM nunuaf <>r, hryond that, « 

WifU rr^^Jt ^ tti ihr tfMn^ ihr^r arf^ i fivi-»)r nrulym^ with the nartiii /ir 

rfr>s ih«'fiv tjur^woM^. trki in wlurh V'»m jrr ^nlf^f^trd pUvrd 

:^ / 



A 



TESTING PROBLEMS 



in a battery that i$ eapectalljr cUffiird 
to rcvtfil th« factorial compoimoii at 
the :eit. 

Fifths we shcHild like to know to 
what extent the teat ii affected bjr 
Virioua eiternal or entrinrf^ factory 
Mich u fNractkt, coaching, ipedal tv* 
perience» culturdi factora, and ao on. 
Thia appUaa (Wurtkularijr lo leaia of 
aptitude and tnteUigenct^ which are 
the subject of thia talk. EipeckOjr 
in the caaa of ma Jiematkal aptitude at 
the higher le vela, tt laeoia tery diScult 
to obttin a meaiure of aptitude that 
w31 not reflect a peraon'a couraaa 
in nuthematiGa— their recencf » the 
^ gradea that he made in (hem, hia 
diligence, to on. 

Sixth, we should like, I think, to 
place this teat into a corrdatioa outrix 
where the other membera repreaent 
cnteria, or criterion faictora, since one 
of the prime requirementa of a teat 
for greatett usefubiesa ia that the teat 
correlate high with/ one critenon and 
low with the o^n That m eiaentii|I 
if you are to have any kind of dif« 
fcrential prediction and^ aa waa pointed 
out by Dr« Rulon thii morning, the 
prr^Hlem of guidance ia baaically one 
of * iffcrential prediction. 

Seventh) we ought to know what 
i\ the contribution of this test over 
anJ beyond what m available from 
ntl»rr, ca*i<*r sourer^ For examplei it 
\s vrry ea^sy to find out the pcrvin's 
fc:hrf»nologica! agf ; will oui mrMiure 
Jf trll ti^ ftomfthi.tg fhat 
inHnnlf^pua! aj^r iU^s not already 
trll mi It t% alvi raiy, in some riy-s 
in .1 Imil f omrnnnity^ to obtain the 
jHis#»ii'5 M}uf4f] T^ir*^i II that II v>» 



then the question is^ Doea the intelli- 
gence test or the measure of aptitude 
tell ua anything that ia not already 
told by the previoua information? The 
md0fimd^ $m9inkiiim k certainly 
aomeching which ahould deiaitelf be 
known, biA my isldom is it revetted 
to ua in the information which teat 
publishera provide. 

Eighth, we ahould aak ouraalvea, 
What is the effect of the tsit on the 
penon who takes ki Moat of these 
aptitude laatt are taken in the environ* 
linent of the rhool, Doea the giving 
of the teat leave ihe individual dis- 
couraged, feeling more ^lopeleaa? Doea 
it lead him to think that achool is 
whem you are ghren qusatioas you 
cannot anawer? In other words, what 
are the side effects, ao to speak, of the 
examinatioi4? 

Some teeia have aoorig arran|e- 
menta~I am talking now of certain 
teeta of the Educational Testing Serv* 
ice~auch that either norma are not 
immediaiely availabk, or the acorea 
do not become immediately available. 
This ia a aerioua handicap, ao far aa 
uae of the test for practical guidance 
is concerned. I was talking laat aum- 
ioer with Dr. Frank Flelcher, Head 
of the Occupational Opporfiunitiea 
Service at Ohio Sute University. For 
their purposes, promptnew of scoring 
ti virtually etaential* They bring in 
a person, let's say, from Cleveland, 
Sandusky, or other parts of Ohio, for 
two days of teiiting and counseling. 
They mij^ht ^Ivc the Educational 
Testing Service's pre-law te%t, but 
•Mncf the lest is '^mred back at Prmce- 
ton, they would h.i%»^ no samfacumly 



1949 INVITATIONAL CONFERENCE 



prompt knowledge o( what the Uw 
test fcoret are. Now, there, nrujr be 
administrative problcmi, problems of 
expense and other problems, thai pro- 
duce these late norms and late scores, 
but so long u those problems remain 
unsolved, the usefulness of the test 
m a guidance situahon such as that 
described will inherently be curuiled* 
It may be that the prime purposei the 
original purpose, of the test is ful- 
filled, but it seems too bad that other 
normal purposes cannot also be ful- 
filled. 

And, finally, it seems to me that 
we ought to expand our notion of 
validity and reiponsibtlity to include, 
let us say, the elimination of muddle- 
headedness by the users of the test. 
I can give you a clear example, close 
to home. Colleges and universities re- 
quire that students applying for ad<- 
mission to college uke the Scholastic 
Aptif'-ide Test, giving a Verbal and 
a Mathenutical score, and, in addi^ 
ihn^ take an English Composition 
" Vu. It has been found in a study 
liy Mm Edith Huddleston, which I 
think is a model in many ways, that 
the Scholastic Aptitude Test will pre* 
diet English composition scores and 
grades better and more cheaply than 
the English Composition Test. This 
was discovered at least one year ago. 
Hut unless colleges are much more 
rc!4X)n«ve than I think, I dare say 
that these collcg ^ are still requiring 
the Scholastic Aptitude Te»t and the 
English Composition Test, 

I think it is tlie resp^mMhility of 
\hf test publisher to pwint out very 
vi^V^t'MKly fo such t>frv^ns that $o far 



u can be icen, the giving df the second 
test^ the Engliih Compoiition Test, is 
not necessary. It is a wastt of money 
and a waste of tin\e. In the tinM re- 
ouired by the English Compontion 
Test, some other test could be substi- 
tuted, for other purpoees and to better 
effect What I am saying, in effect, 
is that matters of policy are not, or 
should not be, considered outside the 
province of the test publisher. Let me 
give an analogy. 

A person who is producing drugs, 
an ethical manufacturer of drugs, 
does not simply put a drug on the 
market and say, ^^Well, if it is mis- 
used, tc is not my fault After all, it 
says here in the print that they don't 
have to misuse it if they don't want 
to and if they have any brains*'^ The 
ethical manufacturer goes to some 
little trouble to sec that the thing is 
not misuted. The same is true, let us 
say, of the nunufacturer of special 
equipment* New radar equ^ment, set 
up on a ship, would not be worth 
very much unless the manufacturer 
went on that ship and saw that the 
radar equipment was used properly. 
That is p^rt of the manufacturer's job. 
It should be part of the test pub* 
lishers* job to take the saihe responsi<* 
bility. 

Well, I dare say that all of this is 
moic or less familiar to you. We know 
that at least viott of these things — 
I think you wi ' agree—should be 
done. What I should like to empha*- 
si/f IS that iU0 ough to foUow through 
on oMf testing, and not simply say 
that **Here isavailaMc- If we think 



V/ 



l"0 



m u m m 



TESTING PROBLEMS 
that IocjpI normi would bt meful, 




we ought to do whit wf can to ite 
that the local norma are correctly otn 
tained. If we feel that juai |iving the 
meant and standard demdona ii not 
enough, we might give an expectancy 
table juid indicate how it may be 
used; with that expectancy table, we 
ought to be very careful to point out 
that the expectancy for extreme icores 
it le» reliable than the expectancy, 
let us lay, for icoret around the mean. 

Butt to go on, granted that thii 
ijT.formation n denraUe, or that moet 
of it it desirable^ we know it ii not 
forthcoming in moet cases. Why is 
thatP Is it unnecessary? Moet of it 
is both desirable and necessary. Would 
giving the information poesibly reveal 
too much of the shortcomings of the 
test? I think that is true, in .some 
cases, but no ethical publisher virould 
on that account fail to reveal the 
facts about his test. If his teit is 
defective, you can ^be pretty sure that 
other testa are e<|ually, if not more 
defective — assuming that the publiriier 
is experienced and capable in the 
selection of tests which he publisjies. 

Is it too expen«ve to provide the 
information we have asked for? The 
answer to that, I think, is a resound- 
ing, "Yes, it if too expensive/* I can 
imagine that» if the Educational Tett«* 
ing Service, for example, did half the 
things I have been talking about, it 
just could nor continue in business 
wiv]>^»»t con^ ail grants from out- 
side ces. In other wcrdsi an cthi- 
ral prfidutcr of tests, if he i% asked 
to meet all of these ideal require- 



menta, would soon find that he is tlie 
gooes that bies fr^ Uy the golden eggs 
with every test, but there just i^n^t 
enough of that gold in the sfMam. 

What can wt do about the matter? 
Well, we can't pass a law. That 
seems pretty dear. Can we educate 
consumers to demand and pay for the 
superior product? I think dtat eoUcge 
teaciiere are succeeding in educating 
the consumer} the test producera are 
also educating the consumer. It seems 
to be a pretty slow process. It is basic 
It is necessary. Is there anything that 
could be done to speed k up?^ 

I h&,ve one suggestion which I 
think has been made before, namely, 
the creation of an impartial bureau 
of standards for test validationi whkk 
would 4iani enough prestige so that 
its word would mean something in 
the maricet of tests. Such a bureau 
should not be connected wkh any 
univenity. It is ""hot^ business, ths 
business of validating tests and saying 
which is which, which is superior or 
inferior, and so on. A university in 
general cannot stand k. And it should 
not be a governmental enterprise, for 
the same reason. Nor should it be an 
enterprise dominated by any one test* 
proditcing organization, becauss if it 
were, the results would be suspect* I 
feel sure that if the author of the 
Ohio Sute Psychological Test, for 
example, were to valklate his test, 
while somebody else in the Educational 
Testing Service validated the Scho- 
lastic Aptitude Tfst, it would be 
pretty hard to avoid unconscious bias, 
and there would alwnys he the suspi*- 



CO 



t 



1949 INVITATJONAL COJJFERENCE 



cion that unconKious bias had crept in. 
So it hiH to be an independent prgan- 
i/ation. 

Well, of course, the queirion comei 
in. Where it the money going to come 
from for that? '^he money can only 
come, it seems to me, if the bureau 



acquires enough prestige to that the 
consi^kmcr is willing to pay for the 
badge of approval and for the stock 
of informition-y-willihg to pay enough 
extra, so that this burtau can be sup* 
ported on a trial basia from the test«- 
pri>ducing organizations thenMelves. 




1^8 1 



ERIC 



Information which should be provided by 
Test Publishers and Testing Agencies on 

the Validity and Use of Their Tests 



PAUL L. DRESSER 



ACHIEVEMENT TESTS 



d0m0 on Tm Vdid^tf 

Although the cmphaaii in the title 
of this paper ti on ^'ihould be/' it 
teemed to me advisable to look fint 
at what "ii'* at a baaia for talking 
about what should be. Perusal of man-* 
uals, supplemented by the reading of 
reviews in the M0ntd MMturemsnis 
Y^mrkooki^ resisted in the listing of 
many different types of evidence which 
have been provided on the validity 
of achievement tests. These can be 
classified into five relatively distinct 
types of evidence, as follows: 

No evidence 
Expert opinion 
Current practice 
Sutistical 

Face validity (V.ilidity by as- 
sumption) 

The no evidence categor)* means 
junt that. The author and publishers 
furnish no evidence on validity, on 
the criteria for construction, or \n the 
significance of the results. Exftri ofm^ 
includes sutements to the effect 



I. 
2. 

3- 
4- 

5. 



ton 



that validity or appropriateness of con^ 



tent are insured by the experience of 
the authors, the'criticisra of specklista, 
or adherence to the recommendations 
of cer^ councils or commttteea. Cut^ 
r0ni friBciu0 includes statements that 
the test content rtpreaenta beet teach- 
, ing practice, that it is baaed on repu- 
taUe criteria or an analysis of text* 
books. StoUstUd 0vidonoo includes data 
or sutements indicating that the test 
discriminates between good and poor 
students, that the scores correlate to 
such and such, an extent with grada 
or point average, thet all items )*'^d 
have been sutis^cally validated, jut 
items have been arranged according 
to difficulty. vdkBtf is a much 
abused and, at this suge, somewhat 
disrepuuble tent). Obviously, state- 
tnents which I have classified as expert 
opinion and current practice might be 
considered evidence of face validity. 
In terms of Master's classification of 
types of face validity, these would seem 
to involve validity by definition or on 
the basis of previous research by others. 
There are, again according to Monier, 
two idditional types of face validity: 
Tht a»umptton of validity on the 



[69) 



1949 INVITATIONAL CONFERENCE 



basis of a common Knsc rdalionship 
to the objective, and validity by ap- 
pearance. When the wnple ttatement 
i% made that the test hat face validity, 
the author or publisher teems men 
frequently to mean one of these last 
two-^hat tti validity by assumptibn 
or by appearance, 

Uie ot expert opinion, current prac- 
tice and face validity as evidences of 
validity is questionablci particulatly 
when as in many, perhaps *nost, cases, 
the expeits, the basis of determination 
of best" teaching practice, * the text- 
books, vetc, are not made known to 
the test user. The 'user has no way 
in which to check the sutements made 
and has the alternative of accepting 
them at their face value or of ascer- 
taining validity through his own '<f« 
forts. Such ' Tjeporting is not in accord 
with the principles of Kicntific re- 
search. 

RAnUonthif of VididUy to RiiUHliiy 

The review of test mar uals also re- 
vealed the following te idencies re- 
garding validity and reliability: 

r Tests of high reliability are fre- 
quently assumed to be highly valid, 

2. Items yielding high reliabilities 
are consintently selected over those 
yicMin},: low reliabilities. 

3. *rhe number of objrrtive^t m« aH- 
urril is restricted for the sake of greater 
hrimnfreneiry and consequent higher 
rrli.ihihty. 

4* No tliMinaion \% made amonit 
rlir various s*Miri( h of vari«ition» Mich 

fa) Vari.itninsamnnK»)r within in- 

(livtdlMU 



(b) Degree of difficulty of the ma- 
terial 

(c) Sampling of the area 

These tendencies suggest that de- 
spite the extensive amount of outei ial 
written about validity and reliabiliiyf 
there txim none too dear a distinction 
in practice^ This confusion is com* 
pounded of ignorance, lack of clarity 
in the concepts, and also lack of ap- 
propriateness to the basic purpose ^ to 
be servtd« For example, various mean* 
ings assigned to validity include: 

(l) The extent to which attesting 
technique gives evidehce of mastery 
of the desired technk|ue. 

(a) The extent to which the test 
indicates status relative to the universe 
of which these items are a sample. 

(3) The extent to which we can 
predia something from the test« 

(4) The extent to which the test 
indicates the ability to handle real lift 
situatioha^hat is, situations outside of 
the classroom* 

These four concepts, except for 
some rewording, are essentially ones 
mentioned by John Flanagan at the 
1948 session of this Conference, Any 
one of them is applicable to achteve- 
ment'^sting in some ways and yet 
also inapplicable or at lean unsattV 
factory in others. Let m examine e;«ch 
\ mrept to clarify th»s. 

Recalling a fact and selecting it 
from a group of proffered responses 
arc not the jame. However, a comple- 
tion ten and a multiple choice test 
on the ume facts can lie set up, and 
the Cfirrelation Utween the two is 
MO indication of the validity of the 
multiple choi( e ten z% a substitute for 



r?oi 



TESTING PROBLEMS 



tK^complction ten. However, u one 
deals with more complex objectives 
^'the problem o( iccuriteljr r«tiVij^ an 
unstructured f esponse becomes so great 
that the decision as to whether a 
structured test situation indicates mas* 
tery of ti)€ desired objective becomes 
more a matter of expert judgment 
than of sutisticol manipulation. If our 
objective of recall d! facts becomes 
one <\f recall at appropriate time and 
pla« for usci we face a diflkult situa- 

The lecond concept of validity — 
the indication of status relative to the 
universe of whic^ these items aVe a 
sample — is productive of confusibn 
with reliabflity. If the items -involve. 
skQls lomewhat different from the 
desired sl^lls, the measure of the ex** 
tent of overlapping or similantf of 
th'*' test items and the desired skills is 
an indication of teit validity. The >e* 
lationship of a sample of the slightly 
invalid items found in the test to the 
^ universe of all such invalid items may^^ 
be more properly thought of th^ 
test reliability than as its validity. On 
this ba^is, the corretationk^f a test 
with a longer test of the same type 
i% not ordinarily a measure of validity. 
It will be only when the test items 
are ihown to, be valid in the fir^t 
vnjr— and then validity and reliability 
art iilrntkal. 

Kflative * ri> the third concept o| 
valMht)— f»rcdi» lion—the difficulty lies 
wiih the tnirrii»n. If we are to accept 
tilt uMial coiir*H' gr^iJr^ asciiteria and 
* ouMf tut tfsf\ VI a* t'l V'ield the hi^hr^t 
jio^tiWr rnr^rl:iti*»n<i with such critcrU, 
ivf !^hiiH MsM<,r\y Mn)uovr nther rhr 



tests or educational practice. A well 
constructed test can much toibrf logi^ 
catly be used to tnvtstigate the validity 
of usual grading "practice. We have, 
in fact, validated tests with grades and 
grades vmh testa rather inditcrimi*^ 
nately, to the confusion of cveryona. 

A student last j^ear asked mc to 
explain just this matter. He.said, 'The 
Dean uys objective tests are used be- 
cause they are . moi accurate than 
instructors' grades. The instructors say 
the tests are no good because we dtm*t 
get the Mme grades. Hie State News 
reported that a study had shown a 
high relation between tests scores and 
grades. Vm confused/' 

Validity conceived of in terms of 
the extent to which a test provides 
evidence of ab&tty to handle real 
situations puts the maker of a sini 
test in an almost hopeless situation 
insofar as producing any statistical or 
other evidence of validity. The sd- 
equacy of perfomunce in a ml life 
situation is a judgmental matter, and 
it is entirely too complicated to pick 
out that which is relevant to a particu- 
lar test. Moreover, planned situations 
are hot real life situations, so that it 
becomes necessary to make extensive 
and surreptitious observations oh all 
individuals involved in the validity 
study^ — a program apt to be produchve 
nf ftituations in its own right. C«m- 
vf ivably this Sitrt of thing may fx duMf 
to study a total program. It van hardly 
he done for a single te^t. 

We appear to 1)C' e)>mii»itiMg all 
haiii for fMabliKhiri^ validity and ur 
are nnt^r t finished. For an aptitudr 
trsx whu h . ni^a- urc'^ mwc trait 



I 71 I 



1949 INVITATION 

ptHiCiil): nuc much inOucnced by tnln^ 
ing, vilkiity may be conirtrued m the 
re tatkmship between the tfi9t periorm** 
nice and actual, perfornunce^ Ah 
achievement test, however^ m imd in 
connection with objective luppoacdly 
and I at least occaiioirallyii actiiillj in* 
riuenced by education. Wc are then 
faced with K!iese additional diiBcultie«t 
( I ) Vah'dtty does not cxisc except 
in relation to a partkubr individual in 
a particular educational program. ThU 
is exempliiied by ^he fact that 

(a) an item constructed to measure 
reasoning does not do so \l it 
has been taught in the course 
and is then recalled* 

(b) an item to measure reasoning 
ts not valid for the single indi- 
vidual who may have seen that 
parttular problem sumewhert, 
even if it is for the rt«t of the 
students. i 

(c) an item involvingUn objective 
not accepted or not developed 
for a course b not valid for that 
coursr* 

(J) a ten not involvin . all the 
objectives of a co ine is not 
valHl unless this deficiency is 
recognized, 
(i) In an attempt lo get validity, 
\t<m% aif vrlecifd on the bawi of 
rvi^irrur ihdt Miiilrnts ihow chaofic 
with reg3td to thtjm. Our tests are 
then loaded with materials on which 
change ii m<>5t aittly obtained and 
ttnd» thertiore, to perpetuate certain 
( and frequently bad ) instru<tional 
insane-. Teu >fffns whkrh are htfst 
m terms ^tf ^tatiMHsl evidrnif are 
if q«»rntly trmiilu 



AL CONFERENCE 

(3) Etnphasts on nationalt re- 
"ioftal, or tvtn local norntt is fre- 
quently related to YAlifUty in i;^si'ch 
norms show the wide differences in 
performance of individuals and of 
groUpti ActuaUy this evidence tends 
to obscure the more important quet- 
^onsof 

(a) actual gains by individuils and 
groupa relative to van jus ob» 
jectivee ^ 

(b) what constitutes a reasonable, 
or an c^timum gain over a 
given period or as a result of 
certain courses 

{cf differential gains for students 
of differing abditieSt The m<JSt 
capable students should show 
the greatest gain»-— not junt be 
at the top o( the distrft>Mtion. 
Up to this point we have bcer^ crfti- 
cal and destructively so, but we can 
now baae 1 positive and, I hcpe, con^ 
structive set of recommendatioai on 
this criticism. At the firit suge it 
would seem to be necessai7 that some 
attentir n be given to clarification and 
careful stanment of some of the more 
important educational objectives (not 
^necewirily only those most empha« 
sized in pracUce)^ These should be 
defined not just in words but in terms 
of actual problem iituattont» with d<r- 
tilled analyses of the kind (v( be 
havior whi< h is of sliould be elicited. 

Ha%ed upon %uch a statement wc 
would then eapec^ a trit author or 
dfStr^buting agenty to 

(t) state the sipenfir nhjfiiftvrs 
invfred by a intand mduatf ihe reb- 
live emphaiis * n each. 

( a ) pn>¥Kjf f VMlrni f that the timk- 



TESTING 

ing done by iiudenu in hindUng the 
tf m k fhat involved in these objectivet. 

(3) prowie evidence of the initud 
sutus of rturiam types of groqw reb* 
five to these objectives. 

(f) provide evidence on possible 
tnd deiorabk gtins which can be mide 
under certAsn condn^ons* 

(5) li« the knowledge, fscts» and 
inrinciples need^ ^ for hendlin^f the tetflf- 
^his is parti if important for tests 
involving api^lications and critical 
thinking, 

(6) list desi^hle educational ob- 
jrctivet not covered by the test and 
^uggestn^ays of supplementing to get 
a welI*rounded evaluation program* 

(7) de-enphasiae status norms and 
place emphaais on growth* 

(8) describe the level of function- 
ing of individuals falling at various 
point) in the dtsttibution of test Kores* 

V (9) indicate the extent to which 
the sampling of a psrticuUr type of 
behavior is adequate for ranking mi\^ 
viduals with repird to that type of 
hehivior* 

(id) provide detailed infodkrution 
nn tfxtbonk analyirt, e ^lem, ihnruc* 
lional pracfjre, or other such expert 
fipinion or currcnr priaKt approachrs 
to valKlrty. 

It may be ohirrtrd ihit thfie *iuk- 
W'^-MNfiiJ do nm reMjU in nei4 $t4imi- 
» MiiruiMruaii^A o( valuhty Ami rf SJ- 
yi^tiav f, ftvr mr^ don't 4jtrf, If -^uth 
*i pf. ,;rihi wf re Miowed ih^ picnfHrc* 

icH u*cr toiild *f)ect Jind 
J ir i w»fh v»me undfrMandm^ iind 
»A»rh %iinr i^iuarnr ihjil the rrv^lff 

IIk iivjfiyl. Hf wiudd he 
^^Mjfji^ftl t«)i prr jkitti ir\t df^ 



ROBLEMS 

termine the amount of actual progress 
made. He would ha^syme idea of 
what to do if he found that inade- 
quate gains were registered. Such tests 
would be valid becatise ^hcy would 
have value and they would be reliable 
because teachers ivould rely on tkcmv 
I really have said what I have to 
say on the assigned topic^ but I have 
one more aelated po^t to make. Test 
makers have been in a stati of unstable 
equthhrium in that they have been 
constantly reaching ahead hi oppor- 
tunities to show teachen the value of 
scientific testing and pt the same time 
leaning over backward to^avoid the 
criticism of controlling or determining 
curriculum or teaching meihoda. In . 
trying to mold tests tt> fit teaching 
practice^ testing has been (uilty of 
perpetuanng practice* We annot a- 
void influeoong instrw.*tuNi, end we 
had better face the issue dea.rly and 
boldly. 

I believe that a qualified subject- 
matter man who seriously tirna his 
attention to evaluation attains s degree 

of insight into the student or ind a/id 
intn the figniflcanceaiid inter«*reIatiQn« 
ihtps of the subject that is far 'beyond 
that of the average teacher. 

r believe thtt as such an evaluator 
»cfk5 for and find«> imuatoni which 
give opportunrey for a student to ihow 
whfihrr he a txtuin %kiii or 
«bl»fy% he alv* find* ihai the«bc wmr 
kind* of situations are the lie$t situ*- 
uon% in which to plsre ihf uudtnt %i 
thai he ni4y gri prattHT in such a ^kill 
<vt s}n^it>^ 

I Inrlif vr fhjt thr «*¥a!ii4Ktr hs^ an 
"ihow how Miih nu« 



n\ 1 



« 



\ 



l'>49 INVITATIONAL CONFERENCE 



tcfun can be uied to improve initruc* 
invn and th«t he shinild be more con- 
^ernfd with chb tlum wiih pre^rving 
the lecrrcy ol hi» teit itemi »o that 
{hty cjin be uied for grading, 

I believe thir many inmcMCtors^ 
parttc*ilarly in general education pro* 
gramt« are sincerely trying to find 
wzY% ol developing thinking ikiUi and 
it(her objecuves bejrond factual know!* 
edge withr^ut quite knowing how to 
du ir« and if we do not adapt our 
evaluative ma^enaU for iimructionai 
wf play the fool by; trying to ten 



and grade students on lonMtbing (or 
whick chry have had no tr.uning. 

Our t^ru job in achieveipent teat* 
trig it to mki teachers to get huxh 
mum achievsruient. Ranking or grad- 
ing shofdd incidental and yet I 
believe^it is obviota that to date oUr 
testing has senred primarily to con- 
centrate the attention of teachers ami 
admintstratora on grading^ ranking 
and selection* It is high time that 
measurenunt and evaluation aMist 
rather than d^ftract the classroom 
teacher. 




1 I 



Inforination which should be provided by 
Test Publishers and Testing Agencies on 
the Validity and Use of Their Tests 



LAURENCE F, SHAFFER 



Hardly & month pa«ie$ wtthouc bring* 
iriij vi>rne new mr 4> irf of p^trioiulity, 
<tfhcr Hiiucd by a «cu pub)iiher| tir 
linnounced in ft book or journAi wfth- 
ilr. The «pp«Araiiti; of thcte many 
i^ts » !helf A locul phf nooKnon of 
great tniereM* Educators and p$y* 
i holag Mi are no lungf r conientJo deal 
only wiih the inlrUe<tual and aca*- 
demc aiipects of pc'^ple« but mcreaa- 
ingly recogniaK the tmportaiKe of 
mcitivaiion «nd emotion in huma y lifr^ 
MrntJil hygiei^e and pcrtonal adj(i^t- 
inrni have come out of the clink*, 
a»nd are 4mphai44«d in edMcatioMi in* 
"imtty and many other fie!d». The 
ilinni of pervinahiy mcasuret in a 
rfNfMinv^ xo a f^reat need ti* irime- 

rt\t fivri^fv «ff wh<*(f people 

iM.-\t (^y'"^Hjl uWr%rr^ find thjil, ?htif 
qn.iliA !r^A nhju «hr<? '^hcfr (|ujintiiy. 
! flf ^^iv hivr fintdv r^NMi^hf**! 



riT TES'l'S 

validafer a measure of aiiy aipect or 
aipecia of per«onalfty. The need for 
(he tesd n to gteAt« however^ that 
nuny come to Uie oruirket ladly lack-- 
mg in endenct of ihetr worth* Thh 
liMtk fquation teems to be th^t ja 
preniflg deound for penonalkjr mea^ 
ftiresy piua almoit k^^uperable tedink^i 
difficultjea, ec^uali i lany bad teats. '\ 
What dau ahoi& 1 be lupplted to 
attest to the validoy of a penona]it| 
uni A fint and very onodest answer 
might be th^tt wms evidence should 
be given rather than none at all. Even 
90 small a demand Kt unmet by manv 
lem. During the past year two elabo* 
rate clinical teiis that draw sweeping 
4 inclusions on many aspects of person* 
a!it> have been ilesiribed by whok 
bijt^ks. In raM, ?he only r^u 
de It e at i ahiiity t% M appeal to the 
Hlumat tm^tKmt «>< th<ir au<huT, dt* 
fiVfd Mudyifiji^, hondr^^il'jiof *avr^ 
The ovf i§ a*fkcd to U/ the u^x 
f»i^!h, aitd to seek evuifni e n" v Akd\iy 

\ii<h j!> In priviu* and vibjrt. • 

»<v^ l<f tiM^»^j uf vjlulry. in pUit 



ERIC 



. U4^^ INVITATIONAL CONFERENCE 

'»c<norU ;>tog?ci%. The^ iwo irsii nkiuei^ ai thb end of the conimuum 

nay k very effecuve niet»urfs art ti)« The nuitk Apperc<epuon Te« 

^MiinyfiiKty wd^'rd^ but wt *H#ll nev^r «nd oihcr picture -^tory fanUijr ttm. 

Jfii/w v)»»rh vt^ru^niy until ihcy *re No n^vktt cun liwe wch n tef^ f ffiJC^ 

>nbjta<4j fti *ouitd mi communk^btc: tivdy, for it ii only a m:*«ni for elicH- 

l^^^fa1gJ^i^^n. inty reiponwff: thai are '^nluated by a 

whofte *irnplfr dnign makn thorough knowledge ot the dynamics 

m f AiviY t-i Atudy objectively have oi human behavior. The fundami;t>t:)i! 

iU- been puMrthtd ^vithoui validatkm. meaiuring inftrumene k the duikiam 

"Ihtn are numtroos queioonnairn ^it^^g^ji father than the vtm. \Vht 



inveofld^d to ttlect Mvdent» whole per- 
maUJjustmr^tt den^Andt ifidi** 
vtdtui aUeii^Mm, wh^ h arc quiie unac* 
t^>nipjn«^d by any ei^;^dence of tiairing 
dtnVjfiteJ $uch iiudentv TiKe rtemi 



tl^e teii; haa no validity apau fn^n tii 
uftr^ h h perhaps irrelevant to ipeak 
of ita valMlity. One can, how* 
ever, tiudy the degree lo which the 
clinician it valid with and without the 



and scur^d responses have \mn le- gJd of a cenain technique, and thereby 
leited t^cat expcrta'hinlt ibem path* 
olognai^ Of ihf<»uac uicy have been 
invkdf d in {ir^ctd Kg limilar querion^ 
ruirrs of bec .uv thry correUte with 
one another m intrintt >ronaiMenty. 
%iih inifuncr* may lead tn$, Mutu to 
trj'i iT.anuaJs <rukaltyi and to r^^ 



wmn the vaJif^ of sht uti indirealy* 
Up to the pretent, few auch atidiM 
ha-:r b^n ^i^poctedi 

At the opfoeitj pole of the tame 
continuum are t{ueiriofiniurfa th^t are 
admin tut red objectively and icored 



, ^ by machine or even by the eMmmee 



rM% rax* >f lor rmarch tiudiei. 

AUKoi^gh tettt «h«^d poaift^ 
Mitut eviKienct ot validjty, tK nature 
*A xht inioftn tion nviy vary gneatly 
iHOfd^nj? to thr chJiratrfr of the ttn. 
Sat li! pk>non4il;> inriiurei are alike 



himielf. Here dependence on clinical 
tki}l k certainly at ita mmiimum^ and 
the tetf imtnir.nert can be ev&ljwMed 
in a^tkm. Therrforr, the ten maker 
hA" a clearer ohiigation, we)) r. a 
Vimt Ahat eaiier opportun ty, to tf* 



Thcr differ y^'^iih both « to the P'^^* unajT^Wguoui urmt. 

,n.t,;.rr iri utvch ihry Mft applied ind ^naih^r vmf>vTrtani way ihst per^ 
t^-- u^v <..f V. hKh are dei^.e;nr<i. '^^'^V tns diftf k m reipec! thf 

S^^rri^^ <>( pr>>>!fmi^ ^.f vaikU\ion nvurber of h\ptfxhtit4 ihat tti^r 

tan dar f»rvl ( .fivarnrij^ the na- dert^ke lo eaf i ff. A •mgle-hypothf ^ 

Uisr i,f \ht Uitv » ^•'^^ > '-!ding a fwij^fr i.off, 

P^^vwtilftv tnti var> n fhe dejTff wh.^^h UAhtti^ tc predict a imgie 

- wfiw S t^^v di-r<^^*^ «he *kilJ of ouK^vrr^e S^n^e *Ka^ anvat»n gtm the 
rta^Diarr At nn^r ratffo^e are nipp(>r unity fot ^»plx<t *'a!«l»«»^jrv, 

■ i^n- i'l inw ifnm ^(\fAh nn %ofr% $Tt rhe tjemand br f% Urxt fh^^M 

^.f^t^rifl, th^t fvaloa'fd qt^al/U- ni':nr rig:v^fo\/i In nv^amrt, )t 

ai i»vn'.p?nrT.><K, of thf per*i^naK n practxahle ?o aet - a v;,'l#<lii*H*n 



V 



e*pcKmrnt wuh m dwign like rJuil 
uHil 1*1 »tudy the rHrcb^fre«i mI m 
«pliiuile uUi j^verAl qucwonnaiut 
ronifriKitd m the armed fonn fW 
ing VVo'ld Wit II were reAiorMbljr 
valkl prrdkior^ of failurr to wijm in 
the prfiona! ret^iitrc mcntt of mtliuu]|r 
1M(VVKT. Terhij^ the unmoi) moiin^ 
iMfM uf i^itn in Kv*r ium contribute^ 
to M 4tKcrm oi wch teitik £quAl^ 
mipotunty and toll miufficimUr 
pk>r<d m miulogous aVilufi iifuatioii»» 
were tHe ditcm^t m6 toon^AnuUMi; 
trmm, th« Urge numben cjmt 
itnr4. And the Hgorooi aiimI)^ of 
Klrm validtcies* ' 

The um of <jjp«r4iionAl <;^tem 
t&M ftwlwt numy of the exiMhg dsf- 
Scultien m the vaUdtfion of penontl*^ 
kf mtMtwtL It 4 hard to iMite t 
Ifft that cUimri to identify **iimotic*' 
ttudenty, bcc«upf no tmt tan define 
fMctly what m r^eajtt bf **r»ettr0tic.^ 
Tc«s to lident]^ itiideiita who wiO 
*rek tht: help it a coui^tielorj oif who 
will get \omnr gradet |!vin indic&^d 
brf iH^ir ifcilttiei, or wh<^ will dit^i 
(rom nctiool K*r other^ than iKide mit 
c»f rtnrto.inir reauonn^ might be ptac* 
(KahJr hecauie the critfTtfi ate L^px- 

Wh^-4 A itn haft cu^jMrucftd 
»»t pu^iim a ^(>ei.tfii: oufx:oft)r^ the u%fr 
v% inr^ m Vnnw m nimplt t'rm$ how 
s<Mri| It fiJuvtKih ffi addiiM^n to 
fitf umimI falidtty d«ti f up rwd in 
ifffiit oi rorrrUYiOH rir thr ijg^ 
>f»: ''W tti^r tj ihf di^fffnre hk'twftf^ 
the fen mikrr vin g-*f 

ni^'} fuf <^fiA prfdirtvin, thf p<r 
M<Hf id ;MV»*»r5v ^frntjR^ ciV^, of 



will be of greater valur (o the mrr 
than Will many more Sophisticated 
tablet. 

Many pcraonalhy leiia aitempf 
intrestifala mtildiple hypoihem raif r 
lhao mgk iMea. T> etr aim ia dc« 
tcribe rhc whole ptn^ Uty^ or large 
5r€«a of Vmther than to aoawer 
ipedfic tjye«iMia. Examples of theae 
ttata vary from ipieackmnairea that 
yield pnifila* of from four to ten 
acofia^ to dkical daviceti aoeh a the 
Horachach Irom wluch ilmoai iiili* 
«)/tely varied peraonaltty dea u ytiui u 
dinwfu 

Mu)ti*dtiiieiiaKma} peraonaiiiy teata 
offer probltim of tralidatiQa that ban 
never been aolved aatibfactorily. In« 
deed, it k pmkh that the ordinary 
coccepe of validity ii compttc^ly inap- 
plicahie to thtm when they are con*^ 
eidered aa broad detcr^ve mcthodn 
A foU dta cr ip t ion of penmaltty can 
be validated ody by crmpar.fftg it with 
lome oquatty i^idl deacriplion obiafried 
by another method. The quaarion 
then ariin aa ^ what w the cnterion 
and what «B the dependent variable. 
If we fooipare the whole Ronchach 
with the reault of a long and well 
planned wriea of tntcrviewfi wMch k 
the criterion ? Do we judge the Ror- 
•chach by the Cntervfewi^ or tl<e mte r* 
ificwa tff the Rorithachf It n fraWWe 
jind vaiuahle to conripare the tr<h- 
nK)ue^ hut the comparbon k 
i^alMfation mi the ^rdin«ry )rnw. 

The inrr of a <fniiUii-dilmfnu<m1! 
»fM often concetfM h* putpoir ai 
pofflr il?itrip|ife, ht ieanti toqndrr 
M'4m\ thf p^runn fwHv a/ifl A^fAf In 
maof in^ianrrn thit U a wrtrthwh*lf 
i>hjr<tv(^. Tf« reaUy MH*lrr'^tiind aii- 



ERiC 



J'>4« INVITAT10>NAL CONFERENCE 



iiiim |Ki!ion, ht thr ukt ol ihe un* which i iftit will be pu^ and to iwpply 

itrxa^ridiiig iiicli unil wuhuut (Mccr depcndai»le i'^iikncc relevtnc lo loch 

lA uliccior vnotivr, i& a rKh and lati^ uica. 

t>mg cxfictiencc, akm to ihc aeuhcMc In itrramg t!i€ functional defini- 

appmiaimn oi gtai an or great mu-^ bn of validttyi the emphaw to far 

ui. In other caiet^ the lource of the ha» been on the queibon of yalidity 

need for undrrtianduig not to puim. for wlyu. Equallr important ia the 

Ti* pry into another penon't innf r need to define vaUditjr for wh^m. Al- 

life, to crack apart im defeoara and though the culture of the enminee 

vie^ * h^f^ baac tnotivatians and con- hat mpltcatioo for tL intanmtatkm 

fiicr>, may be a peepbjirTom kind of of a^t tcaos it haa ipedal lignlicance 

Mtufaction^ to itrve the curioaity of for ptrmality maaaurea. One doea 

the examiner :nd hm need for power, not have to go to Bali for eitamplea. 

In any batance^ feathedc, nnorbid or Atttrudea toward aMthority or toward 

the many degreef that lie between, arx that wtmld petdkt aetioiit per* 

purely deacrtptive per«>naUty itudy aonal maladjhioiient b a bc^ tn a 

for its own lake aione cannot be imaU wburban Wwn n ay be quite 

validated i tt can only be eiperienced, normal for a youngMer in a dty dum. 

Uftrra of multi^limeitvional per* AU too many generaltsatkma tub:::>n* 

lonalrty teKa do not alwaya operate on dated only on mental hoipital patienta 

ehe abatra<t level of pure deici^prion. are being extended lo ooUege icudenta 

They have practical queitioni to an* and other normal adidta in the viter- 

»wer. The Rorachadi worker aib hm pretation of aome penonality leata. It 

rcti wiiether the patient k achtacK ia incumbent on the teat maker to 

phrenic. The college counaelor may ipedty in mme detail the population 
cooMjlt a profile drawn from a quea* for which hia validitiea hold. The day 

imnnaire for a clue aa to whether the m paat when *^iqo ninth grade pupdi*^ 

9tvdent'a anxiety m centered on acm« can be regarded aa an adetpiate de^ 

demk. famdy^ w peer*reUt{on«hip tcnptiDn of a vaUdatioa group. 
problrmiL Thcaa become validatabte Tn mi timary, the validation of per- 
mufv When a complex int ii ap«^ tonaltty leata should kf <fp cloee «o op- 
p\kA to a jipecific problem, cmerit rrational reaHty. A peratmaKt? ten aa 
ic$t\ he tdentifird, and de^^gnt cfi<i- wjch ^an hardly be validated. What 
•iifiiited Hiidv the ittitionsiitpi be- can he vaJidated t% the use of the teft 
in^rrn the cnteria and the teat. for a particular purpoae, and with re* 

h i% very likely that many pemo^ial^ «pcct to a defined group of cxaminret. 
uv ttM% can mytf be vaWifed "'a* a The maker and puMnh^r of a per- 
wh4»l' in (heir descriptive functwHif/ tonality meavure have an ubiigatmn 

Hni alt applKatifkrif of tuch tr^ art lo obtain jnd f irniih informetM^n that 
K^ih^rii Uf <hf * ftitiny of valtdatwrn^ will Ui tHe u*^r know th** degtre of 

I mtker a irat fhrf#forf ha* a cimfidencr wiih which he can apply 
rf'fnn^Mitir dfifrm»nr tJie uien< fu ihr tt to pr4<<kal hiiman offd^^ 

( 7«1 




FAKTICIFAK 



G. SiiUHotK, Paul Hotttp Hi&iiET S. Coneao» JoMirH Zubin, Hre^ 
»CHiL T. Manuel, Rooee T. Lemmom, Ievwo LoEOEt Paul L* Deeml, 
Laueance F. SHAi'rEE, Teumah L. Kelley. 



Chaiemam Biimot: The rctem b- 
luc of Pgn0nm$l Pijthology \m a 
column by Dr» Binfijun which n 
both tncounifing ind diicottrEging, 
If k encourEfing Imkeum ti ii fh« 
prctrnution of e forceful, cEctlknt 
idea. It ii 4iKourEgifif to me becEittt, 
%% Dr. BinfhEiQ mentiofiedl in hii col- 
umn, thin yde« hat been pnrtented for 
«t htm the poll iwcnty^five jretn. It 
Appein. then, that the bg in the mm 
of f xpettanqr tiblft h at leaat twentjr- 
fitt yean end poMbly thirty or thirty* 
f ve. I fthould like Dr. Bingham to tay 
a word about thik 

(Reprintf of Dr. Ajingham'a paper 
enbiled **Great Expettation%** from 
Pfrt^nnH PifiM^gf^ VtJ. 2, No. J, 
Autumn, 1949, were distributed.) 

Iln HiHr.HAM Mr. Bufoi and 
Mcmficri nf the Conference: ] 
rr «ding« the Uajf before yeMerdaf^ the 
tf^P^m of the meeting a year ago, 
whrn >oM were dficuavng the con* 
\Ut 4ii*>n fif \rm* A m<nf tttmUatiing 
#TM , fn me» pfovwalive thought wa* 
f^M»ff|fd there so ftrimr remjr)c« of 



Mr. Langmuir who iiiMted that be- 
fore itErting to build a teat we should 
undertake to define what we want a 
acore on that tret to teQ-ua. 

Now, he did nbt aay, *Vhat we 
waitt the coeOcient of correlation to 
tell ua,** but, *^a acore.^* The meaning 
of a actMe ia what thcp of ua want 
who are concerned with counaeUng 
individuab and who have occasion to 
look at the cumulative record itf 
achievetnent test acoret, peraonality 
appraiaala» and aptitude teabk 

We want each individuari Kore 
to tell ua aoiMthing abo>it Arm. W}iat 
II it we want to know? Moit of ua( i 
believe, tould be grateful to have at 
hand the inlormatiDn that wciuld help 
ui to know what iuch^and*«irh a 
•core on xhh tett telU afiout the frnh^ 
aHUfy that if a person who maken 
that acore goes ahead and takes cat* 
rulua, or enters the dental college, or 
undertakes to !earn carpentry or what- 
ever the lest ia for^ he w^1t or w^1l not 
twceed ai tvetf ar ih^ mfftsgf iKefn- 
of tht f^rtuvlMT $r%tffian g^uf 



1 'J 



TESTING I ROBLPMS 



mm 



1 am gUil that preceding upeakcr^ 
Ur. Conrad in particular, reminded 
us of thr ethics of the phariiMceutical 
manufacrur'en* I have been kicking 
into «ome of their publicatjont about 
new drugs. They are mcticulout to 
state the contraindications, and to 
publish the facts as to when you must 
not uie this drug because not enough 
is known about it yet. **You inay use 
it in luch'-and-such conditions/* they 
say to the physiciar, '^but not in these 
others for which it might ssenvto be 
useful/' Can't we, Mr, Chairman, 
rnise our prufesiion t^ the ethical level 
oi the pharnnaceutical manufacturers 
and the busincsimen who lell drugs 
to physicians? 

CHAimMAtf Butot: I was especial* 
ly interested to hear Dr, Bingham 
compare the amount of information 
provided by tes^ publishers today with 
fifteen yean ag< , to the discredit of 
iht test publishers today. In a pa»^ 
per which I gave at the American 
F^chological Association meeting in 
Denver last months I made the state* 
mentft which I will repeat at this time, 
that the better trf: publishers today 
are s^ipplytng leai info^matioo abt>ut 
(he construction, the validity, the use, 
and the interpretation of their tests 
than the better test puhltshert twenty* 
five years agti. The meetintt i^ now 
open for discitsifon. 

f)a, Duaorrr Mr. Chairman, I 
would in rtmm in my reiponsihiltties 
if I wrre oof to rvM^ in tha* challenge^ 
h.ivfn« leen the Director of the Te<it 
Division of WitrM Book Company 
fi^r 31 perio<l of thiitee n years. 

I think tha* is quite incorrrct v» 



far as any of the publications by 
World Book are concerned. 

Da. Bimoham: Quite so. 

(^haaMan Buaos: Af|d I will 
add to my remarks that one of the 
reasons why I can make that state- 
mcnt is that twenty-five ytars ago 
today the World Book Company 
played a much mora important part 
ill ttst construction than it does to^ 
day* I did not name the con^pany m 
my talk about it at the APA meeting, 
but I stated that only one company 
hu been consistently attempting to 
give adequate information about the 
constructkin, validation, and the uses 
of Its testa. I think that is primarily 
due to the miluence of Truman ^eU 
lejr and Giles M., Ruch, who insisted 
right from the beginning that they 
place the cards on the table. Now, the 
World Book Company does not haH 
a dean record, by aU means, but cotn^ 
pared to the other publishers it has a 
record they can jtiok at with envjr. 
Do you agree with that. Dr. Bing- 
ham? 

Da« Bikoham: | do« 

Da. Duaorr: Well, thank you 
very much, I would like to make 
one i1>ore point — 

CKAniMAff Buaos: Excuse me, 
but 1ft me mention just one more 
point on thatp to |five you an idea* 
In the davs when Truman Kelley and 
Ruch and Terman were putting out 
the Stanford Achievement Test, they 
furnished a wealth of information 
ahciut those tests. Now, the current 
edition of the Stanford Achievement 
Ten doe« not posM^ta that wealth of 
information. As a matter of fai*t« that 



CM 1 



7^ 



1 



INVITATIONAL CONFERENCE 




m.inui^l is nonexiftcnt. In other wordt, 
the World Book Company it dippings 
I thtnki from the days when it had the 
influente of thcie early, pioneers in 
the testing movement* 

Dn. Duacer: May I say that that 
nunual was written and only the un« 
timely death of Dr. Ruch prevented 
its publication^ and that thf later 
Metropolitan Series is accompanied by 
a manual of the sort you describe. 

I would like to make what seems 
to me a much more peifttrating com- 
ment on thiis whole proUemi that is» 
that the test authors^ after all^ are 
the ones ^vho should be pressuring the 
publishers and not the other way 
around. You do the publishers a great 
deal of honof in suggesting that they 
should be the ones to set the standards 
but^ after all> ht^rically ipeaking* the 
publishers hai^ only been -.He medium 
by means of which the production of 
a professional group of this sort has 
reached the public* and it certainly is 
ah odd ctrcumstanct when the pub- 
lisher has to pressure the profession 
to gi4r the informatbn that the pro* 
fession should give in its own right. 

Da. SiASHOti: Mr. Chairman, I 
don^t want to quibble with your state- 
menu hut I would like to see the docu« 
mrnt some time. I would like to speak 
to a more fundamental problem. 

The discusttion by the panel today 
turned out to 1» pretty much a discus- 
stctn of the nature and quality of 
v^itidation reufsjch and nnlf incideh- 
tilly was it a discuWnn of what nhnuld 
iro \n the test mnnual. Even l5f . Con- 
mW rather cogent remarks, which 
were , more dirt tly concerned with 



manuals, revolved around that prob- 
lem. The implicatiofn is sometmes 
given that the publisher is responsible 
for all the validatvin reaearchir We 
like pride ourselves u psychologista 
— not as publisher^ but aa psychol- 
ogists — that we have a free worbng 
world in which all users of tests are 
free to do research. I think it is a 
matter of record that for most of the 
tests now exndng the bulk of the vali- 
dation research was not donf by the 
author and was tiot done at the coat 
of the publisher, bMt was done b)r in- 
dependent, free«*operatinf paychclo- 
gists, wherever they happened to be., 
I think that is good. I think it is 
necessary, ft is independent. When 
a test publisher puts together a ii)Anti« 
al, his chief source of informatkm is 
not the author but the hundred and 
one, perhaps five hundred and one, 
test usen who will share the informa- 
tion with him. I can report that, for 
most of the manuab which I have 
been responsible for editing, the best 
validation research has not necesMrily 
been the author's, but it has been in- 
dependent research by univerMy peo- 
ple, by personnel people, and by school 
users. 

' That ties in with the notion that 
Dr. Conrad gave a While Hgo, that 
publishers have a reiponsMtty for 
servicing their tests, 1 think we do 
^ have a responnbility for senrdng our 
testi, and if we could raise the prices 
hieh enough, we could have a psy^ 
chologrtft in each package. That might 
to wof k. 

r^t me give you some economics 
on this. A test user who purchases 



^ERLC 



If 



TESTING PROBLEMS 

■bout $300 worth of te»u 1 year calls fonn«tion they were giving ih»t 

us by long diitance u\d uyt, "i'ln minuil th«t I wrote a congratulatory 

being heckled becauie lome of your note to th« puhliihert. That is the 

tests aren't too good." Therefore, we only dmt that hat ever happened— 

sent Al Wcsman out, at a cort of although I did have a little poMacript 

$aoo, to fipd out vifhat goea on in in it lajring, "However the main part 

that community. T*h« tttt of you of the aection on validity, ia yet to 

paid for that trip. The $aoo cau of come." Neverthalew, it waa, I think, 

that trip could not come out of the « very fine, honest manual and they 

$300 a year that this one school paya made tHe autemcnt without any quali- 

for that special service. We did it be- fication that they did not have the 

cause we thought we had a hot ipot datt on validity and that they warned 

that needed aome fixing and it did. the teft uier that it would be fortbcom- 



Now, the important thing ia that 
the test publisher cannot give that 
kind of service W of comic book 
piices for testa. If tesu are to be told 



ing, and it has been coming out in 
looaeifaf form. 

Are thfre any other questions? 
Dr. Homt: I want to say, first, 
on a service basia, then there must be that I have no particular test publisher 
a service fee attached, and a teat puh> or testing organisation to defend, 
lisher can go only ao far in that. We Second, that I was very much im- 
do our very best b^' trying varioua preMcd by the list of criteria that Dr. 
methods of publicity, iasuing report;, Conrad gave as neceaaary for qualify- 
and making speeches at various places, ing a teu or giving adequate informa- 
Mayde we don't do too well. We do ' tion about a test. I think it ia ex- 
ab<uii as well as $40-a-weck clerks tremely important, and I think some- 
like us can do in the short time we thing should be done about it. I would 
have available to do it. But that it all ' like to get some kind of an eadmate 
we can do. from Dr. Conrad as to how much h^ 

I think the rnponability for the thinb it wouTii coat to provide^p 
quality of tests an heir usage docs kind ol' aervfc* that he suggests i^M 
not reside in ^he publisher but in the bureau. Now, I think it would be an 
fifty or sixty graduate schools of this extrf niely good idea if it would work, 
country who train test users. but T »m going to stkk my neck out 

Chai»ma # Buaoa: I should like and t'ren I am going to ask Dr. Con- 
<o let you in on another little secret, rad f.o stick his neck out, because by 
Ordinarily, I don't »y nice things of my high-speed electronic computer, I 
»h. test publishers, that is, with respect have just raltulated that to provide 
?o their tt»t» or th^manuals but when all the information that he wants and 



ide manual for the Diffrrenlial Apti- 
«ti«|p Tf*' hatterv by Bennett, Sea^ 
%\u>fr, and VVe«man came out I wa« 
m deeply moved hy the untauat in- 



which he has every right to expect, an 
Dverage test would com the user, let's 
say, roughly two or three hundred 
limes ■« much ai it does at present. I 



1949 INVITATIONAL CONFERENCE 



would be interested to know what he 
thinks it wouJd cost. 

Dr. Conrad: I probably think the 
same as you do, but I would make 
ihki reservation, that the second form 
of a test could be treated in more 
summary bshion than the Arsc: that 
is to saf^ the second form would have 
a certain amount of validity by re- 
semblance, let ua say, to the 6m form. 
Thai ia not ideal. There shmad be, 
perhaps, a tKorougb check every dec* 
ade, flhall wt say, in the case of a 
repetitiv tesc^ but the cost of^ doing 
good work is ertatnly an inhibiting 
factor. 

During the war, I think that some 
very fiof work was done in the con- 
struction and validatMMi of ttsts m the 
Air Forces, in the Adjutant General^s 
Oflct, and for the Navy, most of it, 
or a good part of it at any rate, undar 
the Applied Psychology Pand and in 
the Navy itself. I don't thbk that 
any individual organtiurtion at present 
could afFord to> dit a thorough job on 
many tests* I di think that the con** 
*^imen have to be educated to being 
'P' jlhng to pay more* for taets whidb art 
certified V some ctntraJ tmpartial 
agency in which they can have faith. 

Da. Zubin: Mr Chairman, ti is 
plan oir Dr. ConradU seems to com* 
cide with the plan of the Committee 
vn Diagnostic Dcvkcs of the Clinical 
Section of the American Psychologic 
cat Association which is attempting to 
do something aliout the plethora of 
prri* <na)ity tests that arc flooding the 
market* Not much can be done about 
{he trm already m eaistenre» but the 
pro(imal of this committee is that 

V 



whene' cr a new test arrives (and they 
arrive, as Dr. Shaffer said, almost one 
a moAth), ten cenien bt selected 
throughout the country which will at<^ ^ 
tempt to duplicate the results obtained 
by the test-oiakcr on a similar popula* 
tton. If the test claims are found to 
be valid, an unoAcial stamp of a|H 
proval might be given by ^is com* " 
mittee* % 

The field of personality tests is a 
little different from the field of edu* 
cational achievement testa because the 
compoeitkm or the population dealt 
with very largely desermioes the out- 
come. We might expect the fl.« two 
or three try*outa of this method to 
arrive at contrary retulis. In such in« 
stances I hope that could obtain 
the data in the various centers an4 
see why the test worked well in Cen* 
ter A and not in Center TUs could 
supply us with information not only 
about the sss • but abo about the rom« 
parability of patient claasificatjon in 
varioua centersp 

I wan^ to lake this opportuiity to 
continent about Profeyor Shaffer^s 
statement regarding the rslative val- 
ues of inlerviswa and tesen I think 
the skills of interviewing are in den- 
ger of vanishing In the psychological 
field There seems to be a growing 
tendency to have less and less reliance 
on the interview and depend more 
and m6n on such diagnostic tools as 
projective tests. There has been too 
much dependency on ink blots and 
similiir devices at the expense of the 
basic approach to underst^ ling per- 
sonality through interview. This is a 
good exsanpte of [Jscing the cart be 



I 



TESTING PROBLEMS 




ERIC 



iai( the horse, Ronchtch himieif, for 
f xarnple^ burd his unfiniihed attempt 
at the validity of his test on interviewi 
with patient! conducUii either by him- 
self or by thr refernnii paychUiriit* 
I think this attempt at hiding be- 
hind unvalidated tesu is very unfor* 
lunate. It liems to me that we ought 
tu recapture the kind bf procedures 
that Colonel BingHam wrote about 
fifteen or twenty years ago tn hit book 
on interviewing and instead of hiding 
behmd tests which in turn depend 
upon interview^ standardtic the in- 
terview itself y through recordings and 
modern rating scale t^hniques. Then 
if the diagnostic aids are found to be 
useful, all the mora power to them. 
But until the evaluation of the inter-* 
view itself ia standardiacd, there is 
little hope of validating the teits which 
depend on them for dliidation. 

CHAiUMAit Buitos: Art there any 
other comnMntsf 

I)i» MAMUtt: I want to make a 
point that I think has not been made 
sufficiently well We have been talk- 
ing about what a pubUsbtr ought to 
put in hin manual^ as if the consimiera 
were to be members of this group or 
members of other groups aimOarily 
trained. Well, that just im*t true. 
Thai is unrealistic. 

We need something more than the 
thii;i^s that have been empliaiiacd to 
this [wnt. It is not harr^ to get such 
valulit)* data and tvlubility dau as 
we have been ulking about, but it is 
an f r.cerdingly difficuh thin)i to 
plain the materiaU to the consumer 
m terms that he can undersund — and 
1 am not ipeaking merely iif the ele- 

1 7 ^ 



mentary teacher and the high school 
teacher, although I am speaking of 
them, but I am speaking also of many 
counselors. 

It happens that I have sonifthtng 
to do with a counKling bureau at the 
university level. We have very intel- 
ligent people among our group, but 
one thing with whirK we have to 
struggle' constantly is to get tome 
comrooo*«nae inttrprttation of wh?t 
a test mejms, pllus what it shows in a 
case under consideration* 

One other point: I think we are 
overdoing this matter of prediction as 
the objective of testa. I cannot agree 
with the implication that if you give 
an ACE, you will not need a test of 
English for the same population. We 
have plantjr of evidence now, as far 
as that goesi that in some caaca you 
will get about Sks good a correlation 
with the Q score ai with the L score, 
but does that tnean wt should not 
have bothf The answer depends on 
what you want to do with the scores. 
In manf situations it «a not merely a 
matter of lodUng ahead to set what 
the individtud will probably do, but of 
helping him to get a program that 
will fit into his needs. Fre^ntljr we 
need tests that« so far as correlations 
are concerned, do nut give us any* 
thing more th^n some test already 
adminiitered has given, but do give us 
a better idea of what the mdiv lual is. 

Ma. LtwNON: In a!l these castiga* 
tions of the publisher and author^ U 
seems to me I detect a premise th <t 
the more information which h pro- 
vidcd about a test id the war »f eddi- 
ticMsl leliabiiit^r ;t.nd >a.liJity data» 



Pi4^) INVITATIONAL CONFERENCE 




norms ahJ aoon> the more effectiveiy 
the ifit can be \mi or wiU be meii. 
While I ii)ink in priiic^e one might 
igree with that^ in pracucc « teems 
to me th«t that u only true wtUun 
iimitiy «fid very dtodiMf m k true 
in reUtion to the competence and the 
»()phistiaicion of the t«tt titer. 

We have to diiiiiiguish in our 
thinking among qrpet o( tests with 
rctpect to the amount and complenty 
of infortnatioo which k Ukelf to be 
used with wi^iom, prudence, and 
guod lenvr bjr the ordinary uier of a 
given type of test* 

To mMkt thif ipectfic^ we have had 
a httle raperienct in providing sev* 
cral varietifi of normabve data for 
elementary school adikvement 
and a very cocmnon rtactbn si one of 
confusion on the part of the test uier 
ay to whkh kind of norms he should 
use or carf liie most effectively. It is 
the exteption where we find that 
there k an enrichment of Interpreta* 
tion or an improvetnenf in the me 
of the teee by virtue of having the 
muIktpKcjfy and proliferation of nor-- 
macife data, 

1 would not wiint to suggest that 
we should dixoniinue any efforts to 
get additk^nsl data about tests; cer- 
ttmly mi\ But what I am laymg is 
thjit to .rovkie tf is addiHonsI infor^ 
motion, »vhich without duuht is very 
useful, maybe to people suc<) t% the 
pff^plc of this group^ who are *ro«* 
f«vf to the Hsues involved and tan use 
fhn sdditionai information well^ the 
4mf of the pNhJuif muu he in<rrssfd, 
with no (orrr^p/indtnic ?n* rr*W' in the 
r(fr« tivenf^ with wNkH the ordiltsr^ 



user can use the instrument-- and I 
am thinking mostly of elfmenui;y 
scho<ii teachenh-at aAyicfta«ed cost 
to them. TbercjMfMiouu to the 
amountj^tjiidd^ that the 

MfBH^TZm of givta types of testa 
can digest and urn wtll» and that is 
certainly one limttiog factor, strictly 
from an eoonomjc standpoint, oti the 
exi|tnt to which an author and puh- 
lisher can go along wkh the desire 
for added nifbrmaiioo« 

I want to add another ofassrvation 
on Dr. Conrsd^s suggestioa that the 
publisher and the atithor accept a re- 
sponsibility for preventing their tests 
from being abused^ again thinktfig of 
the analogy with the drug dispenser 
or the pharmaceutkal manufacturer 
who label their warn with auiticmary 
notes, contraindications, ^d m on. 
Well, the number of ways to which a 
test or seat score cm be abiaed is 
legmu Wt certainly could make a 
catalog of the .things that a 
should Not trjr to do with ma clcincn<* 
my Mchitvtmeot mt, an iptkud* tc«, 
or an inicUigmce teat. But I have a 
feeling that if wt were to cetalog all 
Umm thingi and lajr fai the manual. 
"Now, doo*t U7 to do thia, don't tiy 
to do the other with thii tew," we 
would be pretty much in the posiion 
of telling Idda not to put beans in 
thfir fan. Mow of thrm nerer would 
have thought of these things unlew 
we had put it in their minds in the 
firw pl«e, THr,ff are crrtain grow 
miiulies whit h you know frrm ex* 
peoence are c immonlf nuidc with 
certain tvpet of tMia, Well, naturally, 
you Mffguartl «g»in«t thow. But I 



I «4 il 



TESTING VROBLlpMS 



\ 




trymg co Anikupace ati iht foolish or 
ronchievoui (hingt thii in untnCormcd 
u\x uier II likely lo do. 

t( I can add jim one fin«! word« 
again representing^ the vened inter- 
fiVK I would like to obterve — uid ithk 
Atenvi; (roni repeated experience in 
thf« vwonv on what publbhen ought 
to dc>-Hth«l I think the ftetiioAa would 
he (ar more productive if^ instead of 
damning publxihen and authors more 
or lein generkally« and then layungi 
**0h^ we don^ mean 70U and wc 
d H)*i tnran you,^* when Scathorr and 
Diicfoil get 

CHAimMAit BiTKoi^ Bf the way, X 
didn^t go down th« Ime irrrv j^r. I 
}U9t warn you— 

Miu Lakmok,^ I'hai^t all r><(ht. 
Maybe you wenfjl^yn the line aa far 
aa we can |;o» Tuf^ tet'i n Art with 
the next puhtkher or author on the 
liat and My, *'Now« thia k what^we 
mean/* and it we a e going to dainn» 
lejVlGunn in a ctincrtte and not in a 
"^n^ric form, «o that thoae n^n^ 
will have a chance to ifieak to 
the point* 

Chaivmam Buamr Dr\ Conrad, 
iiifi you care to ^ omment on thii at all? 

Da. CoNtADi Welt« I think that 
X d^t^Mcfion ha^ to be ntade between 
iht Urn imt and the tt^ purthatrr. 
The elejjiwfiiary athool tticher ia fi<^- 
qnr^i^tf^ an agent who um* the tnt 
at ihf rrqirtt «tf her principal or -f 
het fuperiiitrndf nt^ It n quite p'^'blr 
iftAif the ^chf r tt at hr ftfren a eenr 
mui h ^mplffird roi>kbook tn |fo with 
tftt But it fhe purthawr, the 
%wprrinitndertt, or the principa^l, ingo- 



ing ti& make an inttUii^rrti 
then he hai tf> ^.'.f.\v\v.<uJ iv? 
lelligertce. VnUm iht »* vi-^^>«0C'/'/ii cim 
a hmAi iviui/iv i^^^ h**: >jd6 by 
■i*^lf«n*fi*hi{> instead n( iy mni.. 
T^fUhft^ I would $ugf*/eit 
whiic it is impo«»bfer not desu^b, 
to put a whoh book into each nanual, 
nesrertheSeis t^at fi^iformatioti should 
be cm up aa an at^urance that ;he 
author hnowi^ what he is talking abr^ at 
and that the publtshet knows what 
iit d tryi^ to sell 

CHAmMAW Btmos; Do you sran^ 
to t^lk ro Dr, Conrad's remarks, t)r. 
I^rge^ 

Da« toftot; I think the point 
whkh has just been nvde '4ioiild be 
an indkatkpn o^ ^ h> we are loiving 
10 much4liflkulty« If adl the infonna'^ 
tion that you needed for estry test 
were proinded in the manual, you 
smuld find that it srouid be impoa*- 
ihle for aay one* human being to read 
all of it/l would tuggesc therefore 
that the neat thing we do with e?vry ' 
test manuiil is to include wit it a 
reading tair« Wa eould find out the 
uegree to whkh they can get the gen- 
erid, overfall s^ew. What is this test 
about? They could get si^fic detailf 
ot where the information about the 
te^t h in the manual, and then per* 
haply check the kintf of inferences thr 
tr ,t user can tnsVr ^ 

It vems, certainly^ from the kind 
r^f talk thst has been gninji; on here 
tocUv; iihii rJther the per^If do not 
know what they are talking about or 
ihey do n yt krow how to mote fr^m 
one level In anotler The idea of 
ha^fiff an eapectaixv table for a 



ERIC 



h 



7 



INVITATIONAL CONFERENCE 

pimi. MKh M kKw n * nthtr ingtn- Tmi, even if tt shows a higher cor- 

"lU' one, hw I think if you knew rotation w^ grade* in Engtk^than 

wint a coirreUtjoo coelSctent s, <{oet an EngUiK ie$t I think we 

y^i^i «hwild Im aMe to move lo an ex- mijjht alw aik the quel ion, If it be- 

fi^'<«an^ uWe, if you h*«e any rela- comes evident that a SchoUitk Apti- 

tK.nfhipi repontd. 1« the rdttionihipi tude Te« doet correlate more highly 

*n. not of the quaniiiative but of the with grade*, may there not he aonie- 

quahuiiV* form, then 1 can sec litt* thing wrong with what k being done 

{Hunt in the expectancy uWe. in that EngUih coune? 

It leemt to me that we are trying CMAim«Aif Bunoa: Dr. Shaffer# 

'» dn too many thing* with too many do yoo^are to nudce any additional 

people at the wwwe time. tcmarb? 

The' la«;point I would like to On. SHAFr^o^: Throughout all of 

bring to the attention of tSe group our dmvmm today there hat t^n 

n that there is a serious probicm here agreement with a perrasive but un- 

in pul^k lelationn We hkw aaumcd, acknowledged philoaophy. We want 

became thete k a group of people lesti thai are prKtical, that will work, 

who call *th«nielm ethJcaT pharma- utd that will correlate with definite 

t «t», or ethical pharmaceutical manu- criteria. The recent celebr^tioo of the 

fatturtr*. that they are i^»/a<l»eth»- ninetieth bi thday of John Dewey 

fal. I^t me asaure you that the num* bring! to mnd the »urc« of our 

ber of people who miiuae drugs it philoiophy. We were aU nurtured on 

munh greater than the number of John Dewey, 

people who minae iMte. ^ The two contrary attitudes about 

Cnmf *K Bu«oa. I «^uld like personality test* that I mentioned ear- 

check with the membeiV of my lier arise from diffeient basic leta of 

panel to find out whether any of them taluet. One »iew holds that penwnal- 

hate any further comments they ity can on'iy be experirnce4 estheti- 

wouU hkc to make. Would you like rally, or perhaps even poetically ; that 

to make any comment*, Df. DremeU there •tt htmc truths about people 

l>«. Dai tafrL: One final one, per- that ti anscend criterw or practkaJ ap- 

hjfa I ihrnl th» matter that came plicafiooi. A rfmjl*r philoanphy can 

«ip about thf use of an English test be fr>\ind in some e«pre«ed viem 

^hett a SchrtUitic Apuiude Ttn h ^mit achtevtment tests. It may be 

iUfti^^ beini: itwd i* a rather im- that F,npJi»h compowirtn, or nomr 

jv.funt thing, It fit« right into the other fuhject matter, i% worth while 

rhili^Sf (hut T iryinjt to out m it* own right, wiihour regard in 

•n my talk . whether it correiatr^ w jh or will pif- 

t cannot viWaKre the Fn^liih He- diet anything ehe. 

f»uinient m my campu* beirl; at all In both perwnility tcvtii and 

Mi»i(?f(l With any placf'inent nr vc- .»hie»en'fnt te^tt our «uf>pim of « 

tH.ntng bMTtl m a S<h«4A»tx Apiugdf prajtmitK piiitn>n, our in«»tfn<e that 



TESTING PROBLEMS 




tfWi m«m be uicfiU a^d ralkl in the 
icMVf th«t they correliie with < rifena« 
tt «n esrprewon ojf our philoiophic 
vifiiie^, Wr \\kt j(ho*t values, aiid 
acknowlrt^ge ouf intellectual deaccnt 
Ifrom John I^wcy and WiUiam 
facnes^ Bur c^r cullurei and oilter 
> ras may adof^c other philoiophies« and 
we must a4tmft that wt do not nectiH 
sardy have the uitunate tn thi. 

Chahmam Buroi; I wonder if 
ti.ere k a chance^ Dr« KcDcy, that you 
would cire to make m few rtmarb? 

Dr. KKLttr: Naturally, I am 
virry intereatrd in alcnoM every- 
thing that haa been yakl. Pi'^jctkally 
all o( h haa touehed me penonally ui 
one way another* I artainly udded 
my >hare of mac?hes in trf^t v fut 
(he Heat on, and Walter 3^)^;;.; /x« Hu 
been one of my fupporten iik jhe 
iT»aHf r. We have tried to put th<? heal 
on, and tnled to (et not only a*\tf>«on 
;ir.H puMr^ri but uaeri at well to 
depend upon infoimaiMA about reli- 
abdity and validity f hat they were not 
accustomed to deptnd upon. 

I don^t know Uw iucceiihil we 
have been* Cerfatinlr^ in • group like 
ihtS I know without a doubt that you 
pet^le have a dependence upon phe^ 
numeru of rtJiability and validity that 
nuld not be paraHeted when the t«H* 
tng movement ttaite^v rt^t^#l Oicar 
Huroa eajid now, because I OtnUc H 
Maricd «oroe little time ago. It auated 
hack in the dayt oi Tho^ dike and 
Hil}ega% and even earlier. Out you 
depend upon evidence of reliahilfty 
and validity. prob«fbdfty diarrihutiont, 
and fofth^ tn a degree and to an 



extent which, to me« n very rncourag- 
ing< It it a degree and a>a extent (hat 
did not exiit not to very many years 
ago* If you depend upi»n it, it n go- 
ing to spread* It « going to ipttad 
to publiihen and teat uien, 

Now, juit how much thtf user can 
digest, I do nut know. I would like 
to mention ooa little device, and then 
aik Dr. Duroit, perhapa, whether the 
public hu digcatedit Irt an early form 
df the Stanford Achievtment Teit^ 
we had a profiie, and we put ^ that 
profile a Utda bar that indicated the 
me of the probable error of a sco^. 
That wa the first time that waauaed, 
r>.'v remember that the publiiher, 
N the World Book Com- 
j^/idxy^ aaid, •*Well, we don't want 
that Nobody knows what i$ metn^** 
I laid. '^It k rather inc^^n^ikuou^} *t 
isn't going to be n^ry wrioua, so juii 
put it' in/' We had th^'te proiiable- 
error on the profile, luid the scoie 
at this point had this proUble error, 
ftnd the score at that point had this 
one. Todayi I would like to do it over 
again and hi vt standards errors, but, 
anyway, we had those T ««^iuld like 
to ask Walter Durost, has that been 
digested if 

Da. Duiotr: I am afra»d th^t iht 
aniv^er^to that is definitely, "No/' 
hut the bar is stdi there. We hope that 
some day somebody will pay atien* 
tion, 

D%. KtuuEv: J think, perhsp«, 
Dr. Buroa, I will not mak? anjr fur*< 
(her rrniarks. One can talk almott 
endtr*ily, bi t we have had such a fine 
seanon of drcuision that it ir not nec<< 
ewry to go on at alt. I thank you. 



ERIC 



1949 INVltATlONAI, rONFERENCE 



l/niiitMAIi But()«. Thank you 
very mu<h, Dr, Kcll<>v I «hoaldolikc 
M !<f4V« jutf one thov^hi' with you— ^ 
Xnd th^c if, there ii a danger here^ 
whcrr we ict u)> the objective of what 
wf nhould like tr«r puhlkherf and te^t^ 
juihors to uipply« We are menu^urihg 
A\ of their pankutar ihjyn^ whith 
wr wpuld like-r^wf U«^o^ we arc not 
mrnttoning all^ hot quite a^ Urge mm* 
l)er. Just becauae you cannot Mpply 
\i\ with all ol xhm information ii cer- 
ritnty no argument that you ihoutd 
corituiue your patt pract^i:el You can 
make prc^reM m the ditection of giv- 
more adei^oate information 



than you are doing no^t and I ahould 
he glad to make lUgg^rvtiont to puh- 
-Iwhen as to ho^rihey c4a give ua lome 
informaUpft;^ which tJ|ey poneaa in 
theijp^rfficff, at very httlt* h any^ extra 
'^pentie. I 

h hat been very diflfidtlt for me to 
keep out of thk af^rnocn meeting 
more than I have. I just Mm to want 
to ulk all the time. I ahodd be very, 
glad io\have one of these Wreaui of 
teat evaliution aet up $o thit it would 
perintf nte to uke a rtat and iK>t try 
to piit out any more M^ui M*Mirt^ 
mitnt Vr^r^ooii. \ shoutil be very 



gUd 



to go out of buftnev^ 



> 



Append! 



IX 

HAR'riCIPANlV-1949 INVITATIONAL 
U)Sy LRE.NCt ON TESTING PROBLEMS 



CAiKit» Berrartit M^ffOfwIit*a Lift h\M* 

WuJiUlflQfl 



ANOf iyiM, Roy. N'onh C*folin4 S»A»f Col* 



AtNofo, SAmufl T., Brown Vro^tmiy 
Ani* MAW, S«lh^ Sprinfffid <olif|tf 

tufe c>f BrooAlyn 
Brc K, Httbtrt R» City CoU^f« of New Votfk 

BtHNiTT, Ck^ffe K,| Piyfho1v>|i<at Corpo- 
ration 

BiNioN, Arthur L., lUinca^ionAl Tntirtf 

BEftCP^en^ B E,» jr.» Kdocationi) T^ini^ 
Setvict 

BcRNnciNi Atftt<f J., QnfefM r< **f|{« 

BiNMUM^ Wiitrr V,, WMhifn\ .1, nr. 

B4H>r^TAviiR, J, B » T^Aficck, New Jfffey 
B')wrn, Funk H , Colltff En^riiKt f x- 

amifiiiion Board 
Braniit, liymAif, Ptt%<inf\f\ Rr*rarth Vr* 

fiun (AGO) 
IlRANUORi^ l"bonijy| L , N York 

B<nr«l of P>HMCAli»>n 
Brtak, Mittjiin M , Silver Bufid^v 1 V^f^^ 



CoHtH^ Eltubnh, Edu<«*^m)rl Tcftm^^ 

COHH, j^Annif M . Ifttfrn;i(>anil Lxdm^ 

OArmcni ^Vorkrn' Un^^n 
CoMUC, Hfffbfrt 8 » OlT € of EiJ^^ca- 

Cci PtJUf^Dk HcrnuA A. AiUn^^t Biffninf 
CompAoy 

CoiMLi,, E*he! L , New Vi. - 'M^tt v^.<|grA« 

tioii Depart ent 
CoWAMJD, Ar F» E<JucA>iitr»Af 'letting 

Senrit* 

Co*v».AV Jfohfi T, EH^fionAJi T«rti«i| 
ilenr ice 

Ch/a^c, ^;nr F,, Univttiiry ol M»j!ie 
C«»I1T^ Willitm J, E.^ QiAoriw Collrfe 

i'VTti^ Norrm E., V«l< Untvenity 
ilYHkmoKt MA/i«fl| Brooklyn CoHrf^ 

l>«>vi*, AUiaoii, !^n«irrMtty of Cbic^ifo 

Sfrv»cr 
Women 

Ilirnpih^rr 

A<>^«* M , FJi.« ii*»«nii| 



ERIC 



I HV IN VJTA i luNAt, CONFERTNCE 

•(>•», H , VVAilikAgKM HMXHrr/* M(tt«i PetwMArt lUmtcl! 

„ „ . . . / H4mcic»« I j1k«i»u. Uftiv«i*.»T of nu 
^•H, Unbrti L. Su** I liwn^tt ttt !«»• 

EtrinNCt. H J. Ufl.fwi*(cy of LwmSo*, H^vioMtnwr, Itoton I. UoifvnUT 

l^rMt, M«m»r, Prm>f>Mt l(m«Rji Kie40*mibl^ AUwn N,, UtlwrwN 

rru, W.lUuiv Coltri* 8«t/*ivrr R^*ini Kor»«««i*» E, trf, K4u<«»ioMl T«»inrf 



F^Mrt, KMhryn 8, 8i*vK^,i ^ml Oia^v Hoiiiw*, Cwtel^ UkUni $^ UPAF 

Or^4AiM^ MKkiH* Corp 

Mhttn ttuof*. M'^rrkf, Vol* Tia*wi 

FuMfM, 1kni$m^K /, Nm VWa Sfitf ^ftif S^rritt 

fntsr.H, John w, ff«W»^ionii,i TrHiAi WMhintiof^ 

f i P^mt»fi?< Ptm^rim) 8nr«ftjb <^V^K* P 4 C^^f CoHe^t c< 

^I«<W fACifVj) ^ ^^^ . , 



s 



,1 f 



ERIC 



HtUvi^iM Aiii>Aidr7 Ci>»**R,^^ 
TVf I 

Mnift, t^n'^t A , Vn#f4M Ad^in^^e*- 
hah , W Ihnffn Homttt^ PiY^vMrna 



ROhLEMS 

Ptiir, WaU#itn Uot^tiiit of Nonh 

Frrtuo^K, DoDAld A. Uft tm<ir«i^« A- 
(TficT Mtiufitmtit Ai»rk*iiiiOfi 

Sf r%k« 

Scnrkf 

PnMMt««, M. fBotAn Pord<M Unbmitt 
RrTHotry, Cforff, BolUo| Air f or« l4jif 
Hiitiiim. Htnrf^ iUitmtKxiAl Taxing 
Srtvke 

Hkh*. jtimw II I |r , PiyrfcotofKit Cofpo 

Ru( tin; Kfrirwn fcdtt«iuo«»i Tt^ing St* v« 
k* 

Roct, fIrbrrT I Forvflutn l^niftrf^^tf 
Rofiv.f, ftdbnl L tl , 5iAiv4j;jfd Oil C^wt. 

pMf of Jmry 
Rl M>>^. Pli^Uip J , M*fYti^« f fmrrrtiif 

mm Muhtn^ V0$pGt%fi<&n 

of Nm^^fig Pi*iu<M^*« r 
^v^T» . A<hn B , tltii^)rifVi>f nl > <»f»f./vrr. 

S:'/r^r, <*-i.ii„'*^„ |lB-i'>i.,« |» ♦<-)Hr.(i^- .,5 ^--^i^ ■ 



i')4«» INVITATIONAL CONIERRNCK 
^rtHU, [Voii<2 f iiirAHMd T«sltil^t TmrtM^ M»v^iief f. , SyraoM* U»iVmitf 

«nt Srnrk* A^d llliooai tMirtvit of VoUinUi llsimwtf 
j)itffHt*tm<« WiUiun, Ur tyrrwty «f / School of U^cilicM* 
%tfw.%^. H^i S . Awrxw jtw^li C««- wmt/ F O:. Vni^.^tj o( fcnrtlirro 

iwmb^ Uft»%mi>T Wt>at^ Btmko Mh Colkfr 

Sv«ii<ifOfttt» Frmct, Ediit j^iwul T^iiaf Wimiaii. Alm«4fff C.» »^| cfc o tot k A l Cor- 

*rrw porttloo • 

itMwuH* PVmvd Mm TmcWn ColWff,, Wnnrw. AWffd 0^ Uh iMwt M»». 

Col«a)4H4i UoitmifT / WlMOOi AlfiWI Im« VKrgloiA Suftr Drpo^* 

T«»oiiif 0»vi4 V , tl«n]i<4 Unhrti^ty mor' o( t jtoo ri ooi 

ToiMMVr, WAivtm 9^ l4jiiKfuoMl A*cv^ WotMAiti ttnlMiio, TfWAviv, land 

ini p-' ikr / Wooo, lUf ClL« Okki Sia«r Drportmrvt of 

Pckmi M W « MttofTf Colkff UmMm 

t ^ * « wk; A$ikuf C , IUIwci^t>o«^/i«coff4« Wairiirmwifft J. W*pr^ Nr« Yori Ro«H 

of EdocAiioo 



/ 



/ 



